Troubles with the new Gromacs core?

A forum just for SPCR's folding team... by request.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
NeilBlanchard
Moderator
Posts: 7681
Joined: Mon Dec 09, 2002 7:11 pm
Location: Maynard, MA, Eaarth
Contact:

Troubles with the new Gromacs core?

Post by NeilBlanchard » Tue Jun 03, 2003 6:42 am

Hello:

I want to hear from some others -- are you having trouble with the new Gromacs core? My two fast machines have has several errors -- checksum and I/O errors, and they are overclocked. I suspect it is a RAM error, rather than a CPU or CPU overheating that is causing it, so I'll try a slightly slower FSB/RAM speed and see how things go with that. Otherwise, I'll have to get better RAM...

The Win2K client has had two (and one was with over 300 of 400 steps done! :cry: ) and they just aborted. The Linux client has had about 4 in a row, and in some cases, it tries restarting the unit back at a known point. The Linux units have all been the same: p539_BBA5_N in water, and I forget which WU it was in the Windows client.

But in the end, I have wasted a lot of time with these. The Windows client has an error message that I don't see in the Linux, that says in a nutshell, if you see more than a few of these, that there may be something going wrong with your computer, like overheating (I doubt this is a issue), overclocking, or other problems. I will try dropping my RAM/FSB speed back a bit -- maybe my RAM is kicking out errors that the Gromacs core is more "sensitive" to?
Last edited by NeilBlanchard on Tue Jun 03, 2003 7:44 am, edited 1 time in total.

miker
Patron of SPCR
Posts: 798
Joined: Sun Aug 11, 2002 3:26 pm
Location: Akron, OH (The Rubber Capital)
Contact:

Post by miker » Tue Jun 03, 2003 7:00 am

I am overclocking 1 PC and it has no problems, and the other 8 are at stock speeds, also no problems.

rpc180
*Lifetime Patron*
Posts: 309
Joined: Sun Feb 09, 2003 8:01 pm
Location: Washington, DC

Post by rpc180 » Tue Jun 03, 2003 7:35 am

I've had only one error: log said that it could not continue, and shutting dowm Gromac's core. But it still attempted to send work the server, which I think it did, and I did get credit for it I believe. Stock system, etc, etc.

dukla2000
*Lifetime Patron*
Posts: 1465
Joined: Sun Mar 09, 2003 12:27 pm
Location: Reading.England.EU

Post by dukla2000 » Wed Jun 04, 2003 1:40 pm

I've had 1 oddity. Identical to rpc, shut down and send to server. Except I didn't get any credit. This was on a Dell P3 laptop.

My main system (XP2700) is actually an oc XP2100: if I drop the VCore too low or the temps get too high then it just crashes folding and scores nil points.

Wrah
Patron of SPCR
Posts: 316
Joined: Thu Apr 10, 2003 1:56 am

Post by Wrah » Wed Jun 04, 2003 1:51 pm

I've had 2 in a row last week, on the same pc. Halfway it would just stop, 100% cpu time but no frames coming out any more. After I restart it I get a missing_work_files error and it starts downloading a new WU.
That pc is a tualatin-celeron running on 133 fsb, memory is 133 mhz sdram. It's been folding for like 2 months now without problems.

NeilBlanchard
Moderator
Posts: 7681
Joined: Mon Dec 09, 2002 7:11 pm
Location: Maynard, MA, Eaarth
Contact:

Well, so far...

Post by NeilBlanchard » Wed Jun 04, 2003 7:12 pm

Hello:

I've dropped the frequency slightly from 154mHz X 13 = 2002mHz to 152mHz X 13 = 1976mHz and so far, so good. It either was the RAM (I'm still minimizing the latencies @ 2-3-3) or the CPU but not from overheating. I've never seen anything above 51C; typically it 47-48C, and so my best guess is that the APacer PC2100 CAS2 is hicupping at a 15% oc with the tightest timings possible. I'm going to be replacing the SVC GC-68's with Alpha PAL8045's and I may be "forced" :wink: to buy some good PC2700 CAS 2 and see if I can't hit 166mHz X 13 = 2158mHz and still keep the temps in the low to mid 40's... :twisted:

Post Reply