Exquisitely sensitive stability testing - the linux kernel!

All about them.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
graysky
Posts: 147
Joined: Fri Sep 16, 2005 4:14 pm
Location: My desk

Exquisitely sensitive stability testing - the linux kernel!

Post by graysky » Sun Jul 29, 2012 7:50 am

TL; DR Summary
The linux kernel is a powerful tool to detect instabilities in your overclock settings with both greater accuracy and sensitivity than either Prime95 or IBT/LinX.

More Details
The linux kernel supplies users with a dead simple method for measuring hardware instabilities -- like those caused by an 'unstable' overclock. There is nothing special to install as this functionality seems to be naively included in the kernel itself. To use it, simply run a standard stress test such as Prime95 or Linpack and watch the output from dmesg. If the system is unstable due to insufficient voltage settings, excessive heat, it will report:

Code: Select all

[Hardware Error]: Machine check events logged
I have seen the kernel throw these errors during a prime95 run before prime95 gave an error in the math. Further, I have seen these errors appear when and linpack did not detect the settings are unstable as evident by the residual number not chaining during the run when the error occurred.

How to Stress Test Under Linux
Probably the most newb-friendly flavor of Linux is Ubuntu. Users can run it live off a CD or a USB without installing it to their systems. Further, it is pre-configured to boot into a GUI with network and hardware autodetected. Download an image from Home | Ubuntu - I recommend the 64-bit version as the 32-bit Linux suffers from the same <4 GB of memory limitation that the 32-bit Windows does,

Note: don't feel like Ubuntu is your only option. There are many other Linux distributions out there from which to choose.

Download the iso, burn it to media or to a USB and boot. Ubuntu prompts users to either "try ubuntu" or "install ubuntu." Just hit the "try ubuntu" button and you will be dumped into the live linux environment.

Here are a few suggestions for stress testing:
1) mprime ---> linux version of prime95. Help to download and run mprime.
2) linpack ---> back end to both LinX and IBT. Help to download and run linpack.
3) x264 video encoding.
4) Compiling something large like the linux kernel.

I have seen on my own machine the ability to pass tests #1 and #2 but an inability to get more than 10 min into a x264 encode or to compile something 4-5 times without errors. It is important to test using several orthogonal stresses. While stressing, print the output of the kernel ring buffer. You can do this in one of two ways:

1) Open a terminal and type dmesg to see a snapshot.
2) Perhaps more useful is to be informed when something happens rather than typing dmesg over and over again! You can do this with the following command:

Code: Select all

sudo cat /proc/kmsg
It looks like nothing is happening, but actually, the command more or less opened a connection to the ring buffer; it will update when something happens. To test it, plug in a USB thumb drive.

Example on my box:

Code: Select all

<5>[13393.025582] scsi 10:0:0:0: Direct-Access     Kingston DataTraveler 112 1.00 PQ: 0 ANSI: 2
<5>[13393.026103] sd 10:0:0:0: [sdc] 7831552 512-byte logical blocks: (4.00 GB/3.73 GiB)
<5>[13393.026449] sd 10:0:0:0: [sdc] Write Protect is of<>133065]s 0000 sc oeSne 30 00
Anyway, you will want to watch for that message I posted above:

Code: Select all

[Hardware Error]: Machine check events logged
Last edited by graysky on Mon May 20, 2013 7:39 am, edited 1 time in total.

Olaf van der Spek
Posts: 434
Joined: Tue Oct 04, 2005 6:10 am

Re: Exquisitely sensitive stability testing - the linux kern

Post by Olaf van der Spek » Mon Aug 06, 2012 12:15 pm

Oops:
[1305600.000054] [Hardware Error]: Machine check events logged
[2403150.000025] [Hardware Error]: Machine check events logged

I'm not even stressing it. Where can the details of the events be logged?

graysky
Posts: 147
Joined: Fri Sep 16, 2005 4:14 pm
Location: My desk

Re: Exquisitely sensitive stability testing - the linux kern

Post by graysky » Tue Aug 07, 2012 7:28 am

Olaf van der Spek wrote:Oops:
[1305600.000054] [Hardware Error]: Machine check events logged
[2403150.000025] [Hardware Error]: Machine check events logged

I'm not even stressing it. Where can the details of the events be logged?
If your getting error on idle that's bad. There aren't meaningful specifics to be had... Best it will say is which core caused errors. There is a package called mcelog that does provide these. You can install it if you want. It writes to /var/log/mcelog

EDIT: the number in front of the error is the seconds since boot that it occurred. I'm guessing these didn't happen at idle.

Olaf van der Spek
Posts: 434
Joined: Tue Oct 04, 2005 6:10 am

Re: Exquisitely sensitive stability testing - the linux kern

Post by Olaf van der Spek » Wed Aug 08, 2012 3:59 am

graysky wrote: If your getting error on idle that's bad. There aren't meaningful specifics to be had... Best it will say is which core caused errors.
It's an AMD Athlon(tm) XP 2500+, so I guess it was core 0. :p
There is a package called mcelog that does provide these. You can install it if you want. It writes to /var/log/mcelog

EDIT: the number in front of the error is the seconds since boot that it occurred. I'm guessing these didn't happen at idle.
Just installed mcelog, let's see what it says.

graysky
Posts: 147
Joined: Fri Sep 16, 2005 4:14 pm
Location: My desk

Re: Exquisitely sensitive stability testing - the linux kern

Post by graysky » Mon May 20, 2013 7:39 am

Bump to do updated info in the post: the recommendation of using x264 and cc to further probe stability. On my current i7-3770K, I recently discovered that my long-time stable settings have become unstable, requiring an extra bump to the vcore. I could do #1 and #2 all day, but as soon as I started some x264 encodes, noticed the errors in my logs.

mkk
Posts: 687
Joined: Sun Sep 05, 2004 1:51 pm
Location: Gefle, Sweden
Contact:

Re: Exquisitely sensitive stability testing - the linux kern

Post by mkk » Mon May 20, 2013 10:30 am

Would be neat if there was made a Live CD specifically tailored for stability testing a system, or just something that compiled a kernel like ten times in a row. Wouldn't even have to have graphics drivers for all the latest models as testing the graphics card is easy enough anyway. Finding out if the core of the system remains stable is more tricky, especially when everything is just fine and dandy under high load tests. So much in a system runs with very different paramaters under load these days, compared to low loads or near idle.

graysky
Posts: 147
Joined: Fri Sep 16, 2005 4:14 pm
Location: My desk

Re: Exquisitely sensitive stability testing - the linux kern

Post by graysky » Mon May 20, 2013 1:48 pm

@mkk - True enough. Best you can do now is to:

1) Boot from the Ubuntu Live CD and do it from there.
2) Install linux to a small partition on the machine to give a more permanent install; such setups can nicely coexist with both MacOSX and Windows.

Option #2 is actually really nice since you don't have to setup the environment after each reboot.

Post Reply