Hit by AMD lockup?

A forum just for SPCR's folding team... by request.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
Zyzzyx
Friend of SPCR
Posts: 1063
Joined: Mon Dec 23, 2002 12:55 pm
Location: Richland, WA
Contact:

Hit by AMD lockup?

Post by Zyzzyx » Wed Jan 28, 2004 8:10 am

Looks like my server was, last night. Its an AMD 1.4 Palomino. Can't be having that lock up and hang on me. Its my web server and mail server.

So, instead of taking it out of production completely, I've just removed the SSE flag for now. Need to double check, but I think it was the forced SSE causing the AMD lockups.

Durnit, and it was my best producer too.

mas92264
Patron of SPCR
Posts: 659
Joined: Fri Sep 26, 2003 5:26 pm
Location: Palm Springs, CA, USA

Post by mas92264 » Wed Jan 28, 2004 8:23 am

Yeah, the Palominos aren't supposed to lock up when using sse. I only have one folding (for a couple of months now) with sse and I don't think it ever has locked up. Some of my Tbreds and Bartons lock up once or twice a week, others rarely.

Weird deal. AMD is supposed to be working on this specific issue now. No recent news, tho.

M

Zyzzyx
Friend of SPCR
Posts: 1063
Joined: Mon Dec 23, 2002 12:55 pm
Location: Richland, WA
Contact:

Post by Zyzzyx » Wed Jan 28, 2004 9:01 am

This is really about the third or fourth time its locked up. Only the first time I noticed that FAH started on a new WU. Went to bed it had 14 hours left. Rebooted it this morning, downloaded a new WU.

I'll double check, but I'm durn sure that its a Palomino core.

mas92264
Patron of SPCR
Posts: 659
Joined: Fri Sep 26, 2003 5:26 pm
Location: Palm Springs, CA, USA

Post by mas92264 » Wed Jan 28, 2004 9:38 am

Did you lose a partially completed wu? I never have lost a wu due to a lock up.

Palomino is model 6.

M

Zyzzyx
Friend of SPCR
Posts: 1063
Joined: Mon Dec 23, 2002 12:55 pm
Location: Richland, WA
Contact:

Post by Zyzzyx » Wed Jan 28, 2004 10:00 am

Yeah, looks like I did.

The fun part is I could have other problems instead. :( Dunno for sure.

mas92264
Patron of SPCR
Posts: 659
Joined: Fri Sep 26, 2003 5:26 pm
Location: Palm Springs, CA, USA

Post by mas92264 » Wed Jan 28, 2004 10:29 am

If it was me, I would look somewhere other than sse. Probably a good idea to leave it off for now, just to see what the deal is.

Ain't trouble shooting a blast? One of the ofc file servers was locking up nearly every day and over a period of a few weeks, I replaced every part in the box. Never did figure out exactly what the problem was, although the cat 5 wall socket had some crossed wires :oops: .

ColdFlame
Posts: 451
Joined: Wed May 21, 2003 9:39 pm
Location: Somewhere in Time

Post by ColdFlame » Wed Jan 28, 2004 7:31 pm

My Palomino locked up on a particular WU consistently so I don't know who came up with the notion that Palominos don't lock up. It would also seem that they reused SSE implementation in all CPUs since Palomino so I'd assume they all should lock the same.

Now, when I had the lock up, I'd just remove that particular WU and make sure I get another one from the assignment server. No problem ever since.

Additionally, I hope you know (I know you do :) ) that SSE boosts your PPW by something like x2.

haysdb
Patron of SPCR
Posts: 2425
Joined: Fri Aug 29, 2003 11:09 pm
Location: Earth

Post by haysdb » Wed Jan 28, 2004 7:34 pm

My Linux server locked up two or three times during a period of a couple of days, so I removed -forcesse and it hasn't locked up since. Whether this was because of SSE directly, or a side-effect (perhaps thermal related), I don't know, but one thing is certain, when the server does down, that's A BAD THING.


[I THOUGHT my server had a T'bred in it, so last night I took it apart to put in an 1800+ Palomino, but it turned out the cpu it had in it already WAS a Palomino. I replaced the cpu anyway since, if I have to turn SSE off, it might as well be on my slowest cpu. I also discovered two other things: 1) the "grill" surrounding the heatpipes was somewhat "caked" with dust. 2) I had failed to remove the plastic coating from the copper shim surrounding the cpu die. These two factors certainly weren't helping anything.]

David

Zyzzyx
Friend of SPCR
Posts: 1063
Joined: Mon Dec 23, 2002 12:55 pm
Location: Richland, WA
Contact:

Post by Zyzzyx » Wed Jan 28, 2004 10:17 pm

My 1700 TBred just locked up too. It seems to do so about once/month. Since its mostly just my media server, I don't really mind too much.

Post Reply