The cause of the problem comes from multiple sources. Often it's very much a case of Problem Exists Between Keyboard And Chair - people agressively overclocking, trying to max out settings or force them in driver control panels, and then complaining righteously and indignantly when problems happen. "But it works with everything else!" is the common refrain. Reality is that they're running something outside of it's designed tolerance levels so it's going to break sometime - it's not "if", it's "when", and the most likely time it's going to break is when they try to use something that particularly stresses those tolerance levels. (That's something our engineer OP should already be well aware of.)
Good point i almost forgot: Games are the type of software that stresses all components of a pc pretty much to the max.
Unfortunately, it's the gamers that usually overclock their hardware beyond acceptable limits for benchmark boasting.
Bad combination...
When I'm feeling in a nasty mood I describe the typical "power user" as "somebody who has just about enough knowledge for it to be a dangerous thing". It's harsh but sometimes it's true.
+1

It's perfectly clear that PCs are in fact too "tuneable" for the casual joe. A mass market machine that needs some decent knowledge to be handled correctly is prone to fail. As far as i know, Apple has way fewer problems with hardware/software failures but considerably less potential for tuning.
The driver situation is a messy one. Alex St John, a DirectX evangelist at Microsoft in the late 90s had a saying that "the drivers are always broken" and I think he was right. This applies to AMD, NVIDIA, Intel, whoever you care to mention. Every 3D hardware manufacturer ships broken drivers, and they know it (look at the release notes PDF for any NV driver and you'll get a good idea of one example of this).
I'm trying to figure out if this is a concept problem that was there right from the beginning of abstracting hardware/software and needs a complete reboot or if this can be fixed by forcing standards onto the industry at the expense of performance/flexibility and competition between manufacturers.
Would you agree that the fact that nowadays "all drivers are broken" is not really acceptable?
Imagine refueling your car at the fuel station with fuel that doesn't guarantee you to get were you want to.
No car driver(pardon the pun

) would accept this.
Looking at PCs, this seems to be an accepted limitation.
In fact, i can't imagine any other product were this high failure rate and lack of QA would be accepted by the consumer.
In the real world, when you ship software (be it a driver, a game, a business app, whatever) you will always ship with bugs. Every non-trivial program has bugs. Sometimes they're bugs you don't know about because some esoteric configuration or unlikely combination of components wasn't tested. Sometimes they're bugs you do know about but they're not currently causing any problems. Sometimes bugs go into the "can't fix" or "won't fix" bracket because fixing them will cause unwanted side-effects elsewhere. You must give the user a workaround instead.
Which reminds me of Carmacks keynote where he stated that Rage was heavily tested for bugs and came out as a pretty solid product.
As we know now, most problems related to Rage on PC are driver based which leads me to the main reason for asking you about your opinion:
Being a programmer, people expect you to supply a working software.
A software that has to work on all systems, in all occasions at any given situation.
If the software fails, it's the programmer's fault. I had this many times before.
It's usually the people that don't know anything about modern software architecture that are the first ones to criticize us.
That the problems for software not working as expected can be outside of the programmers responsibility is not understood.
As idiotic as it may sound for the consumer that just wants a working product for the buck:
I consider the situation with Rage and the Ati/OpenGL dilemma and Carmack's attitude towards this absolutely refreshing.
If even a big company led by one of the most respected programmers in the business can't be sure to release a working product, there must be something wrong in the process.
Imagine Ferrari selling a high powered car and Pirelli supplying tires that blow the first time you go beyond 65 miles per hour.
In the case of OpenGL, ATI in the old days have had quite a reputation for especially poor drivers and AMD have inherited that. OpenGL itself doesn't help with it's huge monolithic driver model and ocean liner full of legacy cruft that it must continue to carry; stuff that was mapped well enough to SGI hardware in the 1990s but is so far off the mark with modern hardware that it's not even funny. D3D imposes a layer of sanity between the program and the driver in the form of the Runtime, but that has it's own compromises. The old "yee-haw, ride 'em cowboy" days of having to code explicitly to specific hardware died for a reason. There is no perfect solution, aside from an unwanted scenario where there is one hardware manufacturer that has 95% coverage. (That, by the way, is exactly who consoles make things easier for developers - you have guaranteed the same hardware for every user, and you can tune your program to the strengths of that hardware without having to worry about breaking on another manufacturer).
Too many cooks spoil the broth. Especially when nobody takes the lead to bring all of them together.
I considered Microsofts idea of certifying drivers a good one although i can understand gpu manufacturers reservations towards this as they fear loss of intellectual property.
Nevertheless, i had certified drivers installed to my system that totally failed.

This situation can lead to being the last nail in the coffin for the pc as a gaming machine when nobody really seems to understand what's going wrong.
Most "power users"

still consider gaming consoles as the main reason for pc games not working or looking properly on their "superduper ultra megahertz machine" when in reality it's the crappy situation with unreliable drivers leading to unacceptable performance overhead that just generates heat.
There's just two gpu manufacturers and two accepted graphics libraries left and this works less than 15 years ago when we had 5-6 gpu manufacturers with their respective libraries + software running without acceleration.
Maybe the business guys have to much influence on this.
As long as "power users"

will buy overclocked and energy devouring gpus that deliver 10 frames per second enhancements at the expense of the product not being reliable, the business guys will sell it.
Again: This might be the final nail in the coffin for the pc as a gaming machine.
Why should any thinking person develop a game for pc with all the related technical problems, not to mention piracy, when consoles do not suffer from any of them?
This goes beyond 3D hardware. I recently solved a problem where all games were hitching and jerking every few seconds by uninstalling a Realtek ethernet controller driver. I've seen Broadcom NIC drivers bring down a server cluster.
Yep, had similiar problems some time ago with software running in the background.
Lots of hardware and software running at the same time have to share resources somehow. A problem consoles (or old Dos PCs

) don't have either.
And yet still many pc gamers consider their system as superior.
I don't know what the answer to the situation is, but all I can do is re-emphasise the questions.
I'd love to hear more opinions on this matter. Especially from people involved in the gpu business.