CPU / Mobo stress tests (Prime95) and random errors

Got a shopping cart of parts that you want opinions on? Get advice from members on your planned or existing system (or upgrade).

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
quest_for_silence
Posts: 5275
Joined: Wed Jun 13, 2007 10:12 am
Location: ITALY

CPU / Mobo stress tests (Prime95) and random errors

Post by quest_for_silence » Wed Sep 29, 2010 7:17 am

Hi all!

I'm currently tweaking voltages on a Athlon II/785G combo: in these occasions I'm used to test the stability running Prime95 and FurMark at the same time in torture/stability mode, while SpeedFan takes care of temps and fans.

Since I got no fatal errors (BSOD, reboots, program shut down, and so on) while testing or whenever, what about some random errors I've got sometimes?

To state more clearly: sometimes the system runs fine for 4-5 hours, some others else (in the very same conditions) one or more Prime95 task give an error, stopping themselves after an apparently random time (one time after 2 mins, another one after 22 mins, or 1h 43 mins, and so on).

The faulty task isn't always the same, but actually I don't know if tasks are always bound to the same core.

I've run a test in default condition for more than six hours just one time, right before altering any voltage setting from AUTO in the BIOS, and got no errors.

Eventually, have I to worry about them? If in case, which actions might be useful to take?

Thanks in advance for any hint.

lm
Friend of SPCR
Posts: 1251
Joined: Wed Dec 17, 2003 6:14 am
Location: Finland

Re: CPU / Mobo stress tests (Prime95) and random errors

Post by lm » Wed Sep 29, 2010 7:53 am

Any random error in those tests is a sign of unstable system. You should be able to run 24h+ without any errors and even then just lower your overclock by a notch to be certain that the system is stable.

To make an analogy to high jumping, when you overclock, you are raising the bar. When it's low, everyone can easily jump over, but when it starts to getting near the limit, some of them fail sometimes. You are now at this point. When you still raise the bar, it becomes hard to jump over and you get even more serious signs of instability.

quest_for_silence
Posts: 5275
Joined: Wed Jun 13, 2007 10:12 am
Location: ITALY

Re: CPU / Mobo stress tests (Prime95) and random errors

Post by quest_for_silence » Wed Sep 29, 2010 9:28 am

lm wrote:Any random error in those tests is a sign of unstable system.
But, I guess, it doesn't tell us the actual reason of the instability. Across the hours it may be due to mains fluctuation, or some OS (Win) related issues, to hardware overheating, and so on.
lm wrote:You should be able to run 24h+ without any errors and even then just lower your overclock by a notch to be certain that the system is stable.

Apart I have no (or infinitesimally little) possibility to run that system 24h+, why 24h+? Why not 72h+? If time is meaningful, 4 hrs are always 4 hrs, aren't they? Why one time a 4 hrs running goes fine, while the following one doesn't do that?
lm wrote:To make an analogy to high jumping, when you overclock, you are raising the bar.

Well, I guess it's not that straightforward.

My CPU (AD605EHDK42GI) runs at stock setting quite low, 1,056V at load, 0,832V at idle (C'n'Q).

I've seen on the web several review, where CPU-Z showed 1,120V at load, while AMD doesn't declare the standard voltages on its website.
I also know that system may run fine at stock down to 0,975V (1,000V is the lowest value reviewers have found) at load.

Even due to these facts, I've decided to try out the automatic overclock feature of my MSI board: eventually, it hangs at any position of the relative hw switch.

So I've decided to try to replicate the allegedly working auto OC positions (+10%, +15%, +20% FSB) capitalizing on the supposed 0,064V advantage on reviewers CPU-Z screenshots.

Well, at 1,120V (@CPU-Z) it ran fines: but during one test, after the usual 4-5 hrs, I have had to sleep the system. When I've awoken it, CPU-Z showed a 1,080V voltage, instead of the set 1,120V: I've decided to run the tests all the same, and it went fine for another 4-5 hours.

After that, I set in the BIOS those 1,08V for the CPU (1,117 in the BIOS gives 1,08V in CPU-Z) to see what might happen. And I've run into problems, i.e. erratic performances. I've tried to tweak other voltages (NB, HT, SB, RAM, et c.) without a guide, but no way up to now to achieve consistent results.

Tetreb
Posts: 50
Joined: Wed Sep 29, 2010 8:37 am
Location: Central Europe

Post by Tetreb » Wed Sep 29, 2010 10:01 am

It's very hard find the real culprit. If the CPU is at standard clocks and voltages, then the PSU might not be stable enough or you do have some hardware failure.
Power saving features seem to reveal unstable systems more also.
For example my HD5750 would hang randomly unless I enabled Ati Overdrive which disables idle underclocking.

b_rubenstein
Posts: 273
Joined: Tue Aug 04, 2009 7:03 am
Location: Brooklyn, NY

Post by b_rubenstein » Wed Sep 29, 2010 11:09 am

All digital systems/subsystems have a Bit Error Rate (BER). They are pretty low, i.e., 1x10^-18. Run the system long enough and there will be a bit error, and if it's in the memory the program will throw an error.

~El~Jefe~
Friend of SPCR
Posts: 2887
Joined: Mon Feb 28, 2005 4:21 pm
Location: New York City zzzz
Contact:

Post by ~El~Jefe~ » Wed Sep 29, 2010 12:44 pm

Yeah, I have gotten errors as well. Increasing voltage reduces errors in the past so I am guessing it is something weak or leaky in the component chain that likes to be over juiced.

I am fairly certain that you should disable cool and quiet during stress testing of components.

danimal
Posts: 734
Joined: Mon Jun 08, 2009 2:41 pm
Location: the ether

Re: CPU / Mobo stress tests (Prime95) and random errors

Post by danimal » Wed Sep 29, 2010 6:47 pm

quest_for_silence wrote:Hi all!

I'm currently tweaking voltages on a Athlon II/785G combo: in these occasions I'm used to test the stability running Prime95 and FurMark at the same time in torture/stability mode, while SpeedFan takes care of temps and fans.
testing all of the subsystems at once serves no logical purpose, in part because there aren't any apps that will behave like that.

try running occt by itself to test the video, and prime95 by itself to test the cpu/ram.

Post Reply