Weird problem - help needed

Got a shopping cart of parts that you want opinions on? Get advice from members on your planned or existing system (or upgrade).

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
KadazanPL
Posts: 144
Joined: Mon Apr 27, 2009 9:58 am
Location: Poland

Weird problem - help needed

Post by KadazanPL » Mon Jul 20, 2009 6:29 am

Hi there,

I'm running low on ideas with my problem-ridden setup and therefore, I'm looking for your help.

Here's my rig:
- CPU: Core2Duo E4300 oc to 3GHz (9x333) with Ninja2 + Slipstream 1000RPM
- Motherboard: GA-EP45-DS3LR
- RAM: 2GB DDR2 OCZ 800MHz dualchannel
- GPU: EVGA 8800GTS 320MB SuperClocked (G80 and stock cooling :( )
- PSU: Corsair 520W modular
- HDD: Samsung 1TB F2
- Soundcard: Asus Xonar DX
- TV tuner: old BT878 PCI card (my oldest piece of hardware, approx 9 years old :) )
- Chasis: Antec SOLO with a 1200RPM Slipstream at 7V
- System: WindowsXP SP3

Problem description:
Whenever I run any program based on Direct3D it crashes after several minutes, giving many artifacts and glitches. It will happen in various ways: either the app will just crash to desktop, or the computer will hard reset, or give me a BSOD (it usually just freezes and I can perform the Alt+F4 fingertrick to return to desktop). When it restarts, Windows informs me that it has recovered from a serious error and the report leads to the driver's malfunction. Curiously enough - there is no problem with OpenGL applications. Furmark can run for hours without any artifacts or crashes.
Temperatures remain within tolerance - CPU reaches 60*C under load (40*C idle) and gives no errors in Orthos, GPU can reach 90*C in Furmark with no errors. During normal gaming it usually stabilises at about 75*-80*C.

What I've tried:
- installing new nVidia drivers with removing the old drivers beforehand. Driver version: 186.18, downloaded from EVGA's website
- installing to most recent DirectX
- underclocking the GPU and running its fan at 100% - more artifacts show up but it seems to take longer to crash - temperatures: 70*C under load
I don't think that it is an issue with temperatures: Furmark "extreme burning" test (OpenGL) can run indefinitely with no artifacts. The same IMO rules out issues with voltages.

Additional information:
My previous motherboard died on me (it was an EVGA 650i Ultra) and I bought this Gigabyte. I removed all drivers related to the previous motherboard in Windows safe mode by using a third party driver removal utility (sorry, I can't remember its name). I installed the drivers that came with Gigabyte mobo and everything runs fine: browsers, office applications, TV tuner software, multimedia players and opengl applications. Still, any direct3d game or benchmark will crash in the manner described above.

Any ideas? I would really like to avoid doing a fresh Windows reinstall, but I've arrived at a point where it seems my only option.
Any help will be greatly appreciated.

(I hope this card isn't going to kick the bucket, it still runs all my games just fine, so I don't really see the need to upgrade.)

RoGuE
Posts: 540
Joined: Mon Jul 06, 2009 9:11 am
Location: Massachusetts, USA
Contact:

Post by RoGuE » Mon Jul 20, 2009 7:07 am

I've never had a problem like this..

but it sounds like a classic software issue...and this is where my expertise goes out the window. The first thing I would try is a ton of "googling" for similar problems. If that turned up nothing, I would start to consider reinstalling the OS fresh, with all latest drivers.

on a side note..have you tried a mem test? I would be curious to see if it fails..

KadazanPL
Posts: 144
Joined: Mon Apr 27, 2009 9:58 am
Location: Poland

Post by KadazanPL » Mon Jul 20, 2009 8:28 am

Thank you for your reply! :)
on a side note..have you tried a mem test? I would be curious to see if it fails..
As a matter of fact, I have. I have four sticks of RAM: 2x1GB Geil 4-4-4-12 800MHz and currently installed 2x1GB OCZ 4-4-4-12 800MHz. I bought the second set because I plan on buying Win7 one day and they might come in handy. Memtest86+ shows no errors with the memory, whether I have 2GB Geil, 2GB OCZ or 4GB combined (I never waited for more than one full run, though).
I don't suppose the memory is to blame, it would probably show in Orthos, Furmark and give other random software errors.

I guess I'll end up formatting the hard drive after all :(

Oh, I tried googling it... I'm still searching...

nutball
*Lifetime Patron*
Posts: 1304
Joined: Thu Apr 10, 2003 7:16 am
Location: en.gb.uk

Post by nutball » Mon Jul 20, 2009 8:34 am

Umm... probably a dumb question but does it do this with the CPU at stock clocks?

RoGuE
Posts: 540
Joined: Mon Jul 06, 2009 9:11 am
Location: Massachusetts, USA
Contact:

Post by RoGuE » Mon Jul 20, 2009 8:49 am

yeah I was wondering the same thing as nutball..

I'm happy you ran memtest, it soothes my worry that it was something costly.

So i'll revert to my original idea and that is that I think it's purely a software problem. I can't tell you at all how to fix it, becasue I'm pretty bad at stuff like that to be honest. I'm more of a hardware guy..who happens to know a lot about windows and linux.

I would do a clean install..and while your at it, maybe flashing your bios to the latest rev wouldn't kill you

Shamgar
Posts: 454
Joined: Wed Oct 22, 2008 8:49 am
Location: Where I Am

Post by Shamgar » Mon Jul 20, 2009 9:16 am

Looks like something might have caused compatibility issues. Whenever I do a major hardware change like a new motherboard, I go through the pain of a reinstall. The driver removal utility you used might not have removed everything tied to the old motherboard: Windows has a habit of hiding things deep in its registry and system folders. A registry cleaning and rebuilding tool might help, but not fix the problem completely. Fresh install of Windows has become a regular part of my computing life :x. Painful, but often necessary :cry:.

If you decide to take the bold step (that every Windows using computer owner must take: reinstalling from scratch!), make sure to BACKUP all your important data. That is, everything you value! You will regret it if you are careless :(. Reinstalling may even save you time, rather than trying to diagnose what could be wrong.

If and after you do a reinstall, make an IMAGE of your system partition and save it to another partition/disk/external drive. You would normally do this after you achieve an ideal and stable setup: with all drivers, main applications, desktop and explorer config'd to your preference, windows updates, AV and security software updates installed. You can make further images later on so you have a few working setups available to you should things go wrong in the future. You will need an imaging/cloning program to do this however.

Hope this helps your situation.

KadazanPL
Posts: 144
Joined: Mon Apr 27, 2009 9:58 am
Location: Poland

Post by KadazanPL » Mon Jul 20, 2009 12:21 pm

Guys, thanks for your help! :)

I did a fresh Windows install (the first one in 4 years :) ) and the problem persists. I seem to have found the name of the bug, it is the infamous

INFINITE LOOP

or in other words:

STOP 0x000000EA THREAD_STUCK_IN_DEVICE_DRIVER (Q293078)

really nasty stuff and I am googling trough thousands of posts by people who fell victim to this plague. I will let you know if I find a solution.

One thing is certain - this is the last nVidia (and EVGA) card I bought in my life. :evil:

RoGuE
Posts: 540
Joined: Mon Jul 06, 2009 9:11 am
Location: Massachusetts, USA
Contact:

Post by RoGuE » Mon Jul 20, 2009 1:03 pm

KadazanPL wrote: One thing is certain - this is the last nVidia (and EVGA) card I bought in my life. :evil:

I feel for you man..but I gotta say, I am super happy with my nvidia, evga card. I was able to over and underclock it without breaking a sweat..and the thing cost me 90 bucks and gives me 90+ FPS in call of duty, over 100+ in WoW, and 30-40 in crysis (med. settings). Not bad imo...

All manufacturers suffer from flukes..

but I can understand your frustration.

At least you get a clean install out of it and your system has been streamlined again.

psiu
Posts: 1201
Joined: Tue Aug 23, 2005 1:53 pm
Location: SE MI

Post by psiu » Mon Jul 20, 2009 1:42 pm

Yeah ATi isn't a cureall for weird problems at all. Though I pretty much stick with them as they are the evil I know fairly well.

I get intermittent VPU recover incidents with my 4670 maybe once every 2 weeks at the most frequent. And weird issues like the 4350 not using Powerplay at 720p resolution.

danimal
Posts: 734
Joined: Mon Jun 08, 2009 2:41 pm
Location: the ether

Post by danimal » Mon Jul 20, 2009 2:00 pm

KadazanPL wrote:One thing is certain - this is the last nVidia (and EVGA) card I bought in my life. :evil:
you haven't shown any correlation between that hardware and this lockup problem, so lets not start assigning blame just yet.

you need to go back to the basic troubleshooting steps, which you seem to have bypassed in favor of reloading the o.s.

i'll spell it out for you, based on my 10 years of experience as a pc network admin: get rid of all the extra memory sticks(use 2 not 4), stop overclocking everything, remove the soundcard and the tv tuner card, and try to re-create the problem.

if the problem still exists at that level, try swapping the ram sticks around, if that doesn't fix it, swap out the video card as a troubleshooting step.

imho, mixing ram sticks from two different manufacturers is not usually a good idea.

RoGuE
Posts: 540
Joined: Mon Jul 06, 2009 9:11 am
Location: Massachusetts, USA
Contact:

Post by RoGuE » Mon Jul 20, 2009 4:59 pm

thats why I asked if he did a memtest, and apparently he did, and it passed.

I've never seen memory that passes memtest be the source of any problem..

KadazanPL
Posts: 144
Joined: Mon Apr 27, 2009 9:58 am
Location: Poland

Post by KadazanPL » Tue Jul 21, 2009 12:47 am

Thank you all for your responses. It is nice to see the support.

Here's what I did yesternight:
- format C:
- fresh install Windows XP SP3
- install drivers from the websites (directx, GPU, soundcard, chipset - all are up to date)
- uninstall PhysX, which comes as a bundle with the drivers and cannot be omitted during install.
- disable "write combine" in the display troubleshooting tab

At this point I ran 3DMark06 and it passed giving about 11000 points, to which I cheered, because I've never seen this kind of score on my computer. I thought that the problem had been solved by removing PhysX and disabling "write combine" but when I tried to run the test again, this time on loop, it crashed.

I moved on with:

- returning all clocks to stock values

No improvement.

I'll keep trying other options once I get home from work.
you haven't shown any correlation between that hardware and this lockup problem, so lets not start assigning blame just yet
Well... I haven't. But please do a quick google search for the "infinite loop" error and you'll see that this problem has been endemic in nVidia cards since 2001. In my opinion - if a producer can't solve a problem in eight years and just goes on ignoring those who fell victims to it (relatively small number), he does not deserve their money.
get rid of all the extra memory sticks(use 2 not 4)
I am using two of the same kind at the moment. Memtest passes with no errors regardless of the memory configuration. I tried two sticks from Geil, two sticks from OCZ and four sticks together - everything went fine. That's why I assume the problem is not with RAM.
stop overclocking everything
I did. Processor back to stock, memory on AUTO (SPD). Nothing is overclocked (the GPU is factory overclocked - it is the Superclocked edition - do you think I should downclock it to the standard defaults for 8800gts?)
Returning to stock values didn't solve anything. 3DMark06 freezes anywhere between the first and the second test.
remove the soundcard and the tv tuner card
I haven't tried that yet. Do you think there might be an IRQ problem? Well, I'll check that once I get home from work.

<rant mode: on>
About EVGA: two years ago, when I bought my current computer, I decided to go for an EVGA mobo and EVGA GPU. The technical support for the motherboard was cancelled several months after the purchase even though I had some problems with BIOS. I notified the European division about my problems and the only answer I got was "this product is end of life - buy a new one". And now the GPU starts acting up... Of course, it's not supported anymore, there's no indication that they even used to manufacture those cards.
To make it more aggravating: EVGA for some reason is considered to be higher-end in Poland and bears the price premium. It certainly doesn't show, if you ask me.
I think I'll stick to bigger manufacturers in the future: ASUS and Gigabyte - they never let me down.
<rant mode: off>

Shamgar
Posts: 454
Joined: Wed Oct 22, 2008 8:49 am
Location: Where I Am

Post by Shamgar » Tue Jul 21, 2009 6:13 am

Looks like you've tried most avenues already. Do you have a spare graphics card lying around? If not, try borrowing one off someone (preferably a non-Nvidia card as you feel that might be the culprit), run it and see if you still get the same problem. Check Gigabyte's support site for that motherboard: there might be a BIOS update relating to your problem. As always, don't flash the BIOS unless you really need to, and backup the old BIOS if you do.

You must be having lots of fun :cry:. Keep at it. And nothing wrong with a bit of ranting... sometimes it's justified :).

psiu
Posts: 1201
Joined: Tue Aug 23, 2005 1:53 pm
Location: SE MI

Post by psiu » Tue Jul 21, 2009 7:55 am

Try removing from case, put on non-conductive surface with minimal parts (no tuner card etc--tuner cards are some of the buggiest pieces of junk usually).

Is the power outlet properly grounded? A cheap outlet tester should tell you that.

RoGuE
Posts: 540
Joined: Mon Jul 06, 2009 9:11 am
Location: Massachusetts, USA
Contact:

Post by RoGuE » Tue Jul 21, 2009 8:39 am

psiu wrote:Try removing from case, put on non-conductive surface with minimal parts (no tuner card etc--tuner cards are some of the buggiest pieces of junk usually).

Is the power outlet properly grounded? A cheap outlet tester should tell you that.
this might be taking it to the extreme...there is no way a shaky power connection could cause a specific error at specific times..it would affect at least several other processes (like torture testing, memtest, etc)

but..i guess if you're desprerate.....

danimal
Posts: 734
Joined: Mon Jun 08, 2009 2:41 pm
Location: the ether

Post by danimal » Tue Jul 21, 2009 9:33 am

KadazanPL wrote:Well... I haven't. But please do a quick google search for the "infinite loop" error and you'll see that this problem has been endemic in nVidia cards since 2001. In my opinion - if a producer can't solve a problem in eight years and just goes on ignoring those who fell victims to it (relatively small number), he does not deserve their money.
i don't think that there is much in common with video cards made in 2001, and video cards made in 2009... and the error message "infinite loop" is just about as prevalent for ati as it is for nvidia, 36,900 search results vs. 43,600:
http://www.google.com/search?hl=en&q=in ... f&oq=&aqi=
http://www.google.com/search?hl=en&q=in ... oq=&aqi=g1

as for memtest, this isn't necessarily about bad memory per se, it's about differences in timing and such, that may only show up under overclocking conditions, or with certain motherboards... i have seen some negative feedback from people with certain combinations of high-speed ddr3 and evga motherboards, for instance.
I did. Processor back to stock, memory on AUTO (SPD). Nothing is overclocked (the GPU is factory overclocked - it is the Superclocked edition - do you think I should downclock it to the standard defaults for 8800gts?)
yes, i think that you should underclock it, back to the standard 8800gts timings, and then keep going down with the frequencies until it stops failing that test... i have seen at least one account of a defective superclocked gtx 260, that failed only when it was used with the superclocked timings.

here is the official evga forum, does your particular video card have the lifetime warranty on it?
http://www.evga.com/forums/default.asp
I haven't tried that yet. Do you think there might be an IRQ problem? Well, I'll check that once I get home from work.
possibly, but also flakey drivers that conflict, or even a marginal power supply that is pushed over the edge when it's combined with those two cards and superclocked frequencies on the video card... just remember to only use two sticks of ram throughout all of this testing, and leave those two cards out as well.

please keep us posted on how it goes.

KadazanPL
Posts: 144
Joined: Mon Apr 27, 2009 9:58 am
Location: Poland

Post by KadazanPL » Wed Jul 22, 2009 1:08 am

I think I might have solved the problem <knock on wood> :)

The bad news is I don't really know how. :/ Here's what I did:

0. Stock values on all hardware frequencies (in the GPU's case stock=superclocked)
1. removed all expansion cards
2. removed one of the RAM sticks, so I had 1GB
3. loaded fail-safe defaults in BIOS

At this point the computer failed to boot. I thought that's just great - it fails to boot with "fail-safe" defaults. Now isn't that ironic? I must admit I was close to smashing the junk against a wall.

4. I accessed BIOS and disabled ALL onboard devices I didn't need: floppy controller, PATA controller, COM and LPT controllers, audio, LAN (I use a cable modem on USB) and some others I can't recall now.
5. Boot to Windows successfully and... voila! 3DMark06 finished the test with a modest 8000 points. :)
6. I ran the test three times and it passed with ease.

At this point I began resetting the hardware config:

7. I plugged the second stick of RAM to get 2GB - test passed
8. I plugged my Xonar DX audiocard - test passed
9. I plugged in my trusty old TV tuner - test passed

I was beginning to wonder "what if" :)

10. I overclocked and overvolted the CPU to 3GHz and 1.4V and 3DMark06 gave me hitherto unseen 12000 points! :D Not bad for a computer dated 2007.

Now the question is: what was the problem? The only answer that comes to my mind is RAM after all. OCZ claims that their memory should run at 2.1V. When I loaded fail-safe defaults it returned to standard 1.8V and I didn't touch that. After all the tweaks that I did having loaded fail-safe, this is the only value that is different from the previous buggy setup. Interestingly, higher voltage is recommended by the producer (OCZ) and didn't give any errors in Memtest.

Oh well... the computer seems to work fine now, so I don't want to mess with those settings any further.
don't think that there is much in common with video cards made in 2001, and video cards made in 2009... and the error message "infinite loop" is just about as prevalent for ati as it is for nvidia, 36,900 search results vs. 43,600:
Yep, I was too hasty to throw stones. After some further reading I reached a conclusion that "infinite loop" is a generic error that shows up in hundreds of cases and can be caused by multiple config blunders. It probably MIGHT indicate faulty hardware but is often just a software conflict.
yes, i think that you should underclock it, back to the standard 8800gts timings, and then keep going down with the frequencies until it stops failing that test... i have seen at least one account of a defective superclocked gtx 260, that failed only when it was used with the superclocked timings.
Luckily, this was not the case :) It runs fine with factory overclock.

EVGA doesn't give a lifetime warranty in Poland :( It's only two years and they have already passed. I did register the product within 30 days of purchase so theoretically the card should be covered by the 10 years EU warranty. We shall see, the time for it has not yet come.

I would like to thank everyone who took their time to reply in this thread. Your help is much appreciated! :)

RoGuE
Posts: 540
Joined: Mon Jul 06, 2009 9:11 am
Location: Massachusetts, USA
Contact:

Post by RoGuE » Wed Jul 22, 2009 5:10 am

First of all, nice troubleshooting.

Second of all...I KNEW IT WAS THE MEMORY!!! lol

god..this is just one of those incidents that I need to add to my list of reasons not to stray from G.Skill memory. ill never own OCZ mem again

glad you're back in action man!

Shamgar
Posts: 454
Joined: Wed Oct 22, 2008 8:49 am
Location: Where I Am

Post by Shamgar » Wed Jul 22, 2009 8:19 am

Good work KadazanPL. It looks like it was the RAM after all. At least now you know. BTW this is a good time to make an image/clone of your system for easy restore in case something goes wrong in future. But your issue seems to be BIOS settings based rather than Windows and drivers based. In any case, it doesn't hurt to make an image of a good working setup.

Thanks for keeping us informed!

If I ever run into a similar problem, I will make sure to check RAM first.
1 user's troubleshooting victory = valuable knowledge for many :).

nomoon
Posts: 210
Joined: Thu Apr 14, 2005 8:35 pm
Location: Allen, TX US
Contact:

Post by nomoon » Wed Jul 22, 2009 8:39 am

I have this motherboard and it seems to be very picky about it's memory. The first memory that I used gave me intermittent crashes (Sorry, I don't remember the brand). I then bought some Mushkin and haven't had a single problem.

Take a look at the Gigabyte Forum for their suggestions on the most compatible memory. Do a search. One of the moderators frequently makes comments or FAQs about which memory works best for these boards.

Jason
Last edited by nomoon on Wed Jul 22, 2009 9:01 am, edited 1 time in total.

danimal
Posts: 734
Joined: Mon Jun 08, 2009 2:41 pm
Location: the ether

Post by danimal » Wed Jul 22, 2009 8:49 am

congrats on getting it running! your final troubleshooting approach there is a great blueprint for others to follow.

what was the recommended voltage for the geil ram? if it was, say, 1.8v, while the ocz ram was 2.1v, you had an incompatability that probably never would have worked out right.

ocz ram is guaranteed for life, those sticks could be o.k. if they are run by themselves at the appropriate voltage.

i have had two different types of fast ocz ram overclocked on my ep45t-ud3lr motherboard, and it's been very solid, once i got it stable... currently i'm at over 1800mhz with 2x2 gigs of ocz.

KadazanPL
Posts: 144
Joined: Mon Apr 27, 2009 9:58 am
Location: Poland

Post by KadazanPL » Fri Jul 24, 2009 6:44 am

Two days after the successful 3DMark06 run I am pretty confident that my computer is stable. Thank you again for your support and watching my struggle :)
Maybe I'll even organise a photo session for the gallery section ;)
BTW this is a good time to make an image/clone of your system for easy restore in case something goes wrong in future. But your issue seems to be BIOS settings based rather than Windows and drivers based. In any case, it doesn't hurt to make an image of a good working setup.
I've never done that before, but it seems like a good idea. I always encrypt my drives with TrueCrypt - do you think it may interfere with making an image of the drive?
what was the recommended voltage for the geil ram? if it was, say, 1.8v, while the ocz ram was 2.1v, you had an incompatability that probably never would have worked out right.
Both types of RAM are close in specs: 4-4-4-12, 800MHz, both should run 2.1V (the producers' recommendation). It remains to be seen whether they'll work together - I'm planning to switch to Windows7 in Autumn, and will try to use 4GB then.

Shamgar
Posts: 454
Joined: Wed Oct 22, 2008 8:49 am
Location: Where I Am

Post by Shamgar » Fri Jul 24, 2009 8:14 am

KadazanPL wrote:Two days after the successful 3DMark06 run I am pretty confident that my computer is stable. Thank you again for your support and watching my struggle :)
Maybe I'll even organise a photo session for the gallery section ;)
We're all in it together, bro. We're the SPCR army, marching onto victory. :lol:
KadazanPL wrote:
BTW this is a good time to make an image/clone of your system for easy restore in case something goes wrong in future. But your issue seems to be BIOS settings based rather than Windows and drivers based. In any case, it doesn't hurt to make an image of a good working setup.
I've never done that before, but it seems like a good idea. I always encrypt my drives with TrueCrypt - do you think it may interfere with making an image of the drive?
If TrueCrypt is installed when the image is made, I think it should work, why not? However, I cannot say this for sure. Also, to image/clone a partition or drive, you need a special utility like Norton Ghost, PowerQuest Drive Image (has since been bought by Symantec) or Acronis True Image. There are some free/shareware ones available but I don't know which ones are reliable. I use an old version of PQ Drive Image which still works okay, but I'm thinking of getting something newer like Acronis.

Hope I haven't complicated things too much for you. Just something to consider. :wink:
KadazanPL wrote:
what was the recommended voltage for the geil ram? if it was, say, 1.8v, while the ocz ram was 2.1v, you had an incompatability that probably never would have worked out right.
Both types of RAM are close in specs: 4-4-4-12, 800MHz, both should run 2.1V (the producers' recommendation). It remains to be seen whether they'll work together - I'm planning to switch to Windows7 in Autumn, and will try to use 4GB then.
Yes, most RAM running at 4-4-4-12 will ask for 2.1V. As for running the two sets of RAM together, you can always give it a go. For me personally, I would rather use matched pairs of RAM from same manufacturer.

Post Reply