It is currently Tue Sep 30, 2014 12:28 pm

All times are UTC - 8 hours




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: SMP segmentation faults
PostPosted: Sun Jul 08, 2007 11:52 am 
Offline

Joined: Thu Aug 03, 2006 7:06 pm
Posts: 26
My last two work units terminated with similar errors at 67%. Here's what I copied from the terminal:

[13:06:32] Writing local files
[13:06:33] Completed 335000 out of 500000 steps (67 percent)
[0]0:Return code = 0, signaled with Segmentation fault
[0]1:Return code = 0, signaled with Segmentation fault
[0]2:Return code = 0, signaled with Segmentation fault
[0]3:Return code = 0, signaled with Segmentation fault
[13:23:53] CoreStatus = 0 (0)
[13:23:53] Client-core communications error: ERROR 0x0
[13:23:53] Deleting current work unit & continuing...
[0]0:Return code = 0, signaled with Quit
[0]1:Return code = 0, signaled with Quit
[0]2:Return code = 0, signaled with Quit
[0]3:Return code = 18
[13:28:21] - Preparing to get new work unit...
[13:28:21] + Attempting to get work packet
[13:28:21] - Connecting to assignment server
[13:28:21] - Successful: assigned to (171.64.65.56).
[13:28:21] + News From Folding@Home: Welcome to Folding@Home
[13:28:21] Loaded queue successfully.
[13:28:33] + Closed connections
[13:28:38]
[13:28:38] + Processing work unit
[13:28:38] Core required: FahCore_a1.exe
[13:28:38] Core found.
[13:28:38] Working on Unit 05 [July 8 13:28:38]
[13:28:38] + Working ...
[13:28:38]
[13:28:38] *------------------------------*
[13:28:38] Folding@Home Gromacs SMP Core
[13:28:38] Version 1.73 (November 27, 2006)
[13:28:38]
[13:28:38] Preparing to commence simulation
[13:28:38] - Ensuring status. Please wait.
[13:28:55] - Looking at optimizations...
[13:28:55] - Working with standard loops on this execution.
[13:28:55] - Previous termination of core was improper.
[13:28:55] - Going to use standard loops.
[13:28:55] - Files status OK
[13:28:57] (decompressed 537.8 percent)
[13:28:57] - Starting from initial work pa- Starting from initial work packet
[13:28:57]
[13:28:57] Project: 2608 (Run 0, Clone 40, Gen 30)

It appears to be running the same thing over and over, and crashing at the same place each time. I'm going to stop it, delete it, and try to get a new, different one.

Has this happened to anyone else? Is it my machine or a bad work unit?

Thanks

_________________
An idle processor is the devil's workshop.

Contribute to medical research by putting your idle processor to work on folding@home!


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 09, 2007 3:59 pm 
Offline

Joined: Tue Aug 22, 2006 5:14 am
Posts: 105
I can't say that I've seen your particular error but here are some incidents where I've seen SMP bomb out on me:

1. Network connection drops and/or when establishing a VPN connection
2. Attempting to restart FAH when FAH processes are already running or MPICH2 service isn't started
3. Group policy or other automated script removes administrative rights from the account under which FAH was installed with
4. Unstable processor/memory from excessively high temps or too high of an overclock

I think that in your case, a complete removal and reinstallation should work. My usual steps are:
- Stop FAH and ensure all FAH processes are not running (or reboot)
- Stop MPICH2 Process Manager, Argonne National Lab service
- From add/remove programs, uninstall Folding at Home
- Delete Folding at Home folder from c:\program files\ or your install directory
- Reinstall using .exe and run install.bat (this should remove and reinstall the MPICH2 service)
- Reboot (optional) and run fah.exe


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 09, 2007 6:23 pm 
Offline
Friend of SPCR

Joined: Tue Dec 19, 2006 6:22 pm
Posts: 235
Location: Seattle
Awhile back I got a work unit that errored out, then it happened again a couple days later on the same one (can't remember the # though). I haven't gotten that WU since then, as far as I can tell, and no problems since then, so I blame it on a bad WU.
The other 2 things that have caused SMP not to run for me are:
- Password change on the PC, then you need to run install.bat again
- Expired / timed out FAH.EXE (Beta period expired), need to download it again.

_________________
Antec P180B | Corsair HX520 | Gigabyte DS3L | E8400 @ 3.6 GHz | Noctua NH-U12F | 2GB Ballistix DDR2-800 | EVGA 8800GTS | 2x Samsung HD501LJ


Top
 Profile  
 
 Post subject:
PostPosted: Mon Jul 09, 2007 10:37 pm 
Offline
*Lifetime Patron*

Joined: Sun Dec 21, 2003 11:24 am
Posts: 1735
Location: 'Sunny' Cornwall U.K.
Ivoshiee over at the folding forum is having the same prob.
No solution though...


Pete


Top
 Profile  
 
 Post subject:
PostPosted: Tue Jul 10, 2007 1:16 pm 
Offline

Joined: Thu Aug 03, 2006 7:06 pm
Posts: 26
I did control-c to stop the folding processes, renamed the directory with the folding executable and data, made a new folding directory, opened the archive in it, and ran the executable. It got a different work unit, which finished today without problems.

Must have been a bad work unit. I had done about 20 smp work units before, and this one failed twice, at the same point each time. Lost 4 days of folding, though.

The work unit I got today is the same project, but a different clone and run than the bad one. I'll see what happens.

_________________
An idle processor is the devil's workshop.

Contribute to medical research by putting your idle processor to work on folding@home!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group