It is currently Sun Nov 23, 2014 8:19 am

All times are UTC - 8 hours




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: Help - Deleted Work Units
PostPosted: Wed May 07, 2003 4:08 am 
Offline
*Lifetime Patron*

Joined: Sun Mar 09, 2003 12:27 pm
Posts: 1465
Location: Reading.England.EU
OK, using Console & Electron Microscope. Have handled 12 WU on one of my systems, 4 of which have been deleted ('nil points'!).

Crazy thing is it runs and 'looks' normal, but when it gets to 100% nothing happens. So I took to shutting down & restarting, at which stage the unit was deleted. Now I have been watching the log on that system more closely, and I noticed the early warning is that actually sometimes the FAHconsole stops writing to the log, even though EM reports Frames clocking as usual. When I catch the 'no log' early enough (before 100%) and restart, it restarts as per EM reporting.

An example log extract is below (only blank lines editted out): everything fine up to 90 frames, then nothing in log (till I manually shut down). In the meantime EM reported frames counting up to 100, at which stage nothing happened, so I shut down (& lost my points :evil: ) Any offers?

...
[10:11:23] Completed 225000 out of 250000 steps (90)
Folding@home Client Shutdown.
--- Opening Log file [May 7 11:42:50]
# Windows Console Edition
Folding@home Client Version 3.24
http://foldingathome.stanford.edu
email:help@foldingathome.stanford.edu
[11:42:50] - Ask before connecting: No
[11:42:50] - Use IE connection settings: Yes
[11:42:50] - User name: Dukla2000 (Team 31574)
[11:42:50] - User ID = 55B302820D53F25D
[11:42:50] - Machine ID: 1
[11:42:50]
[11:42:50] Loaded queue successfully.
[11:42:50] + Benchmarking ...
[11:42:54]
[11:42:54] + Processing work unit
[11:42:54] Core required: FahCore_78.exe
[11:42:54] Core found.
[11:42:54] Working on Unit 01 [May 7 11:42:54]
[11:42:54] + Working ...
[11:42:54]
[11:42:54] *------------------------------*
[11:42:54] Folding@home Gromacs Core
[11:42:54] Version 1.45 (April 21, 2003)
[11:42:54]
[11:42:54] Preparing to commence simulation
[11:42:54] - Looking at optimizations...
[11:42:55] - Created dyn
[11:42:55] - Files status OK
[11:42:55]
[11:42:55] Folding@home Core Shutdown: MISSING_WORK_FILES
[11:42:58] CoreStatus = 74 (116)
[11:42:58] The core could not find the work files specified. Removing from queue
[11:42:58] Deleting current work unit & continuing...
[11:43:02] + Attempting to get work packet
[11:43:02] - Connecting to assignment server
[11:43:15] - Successful: assigned to (171.64.122.125).
[11:43:15] + News From Folding@Home: Welcome to Folding@Home
[11:43:15] Loaded queue successfully.
[11:43:16] - Deadline time not received.
[11:43:17] + Closed connections

_________________
2009/Oct: Jetway JNC81-LF * 4850e naked under fanless Xigmatek Apache * Antec mini Skeleton w/Nexus 120mm PWM fan * Delta 90W brick w/Skeleton DC-DC board * WD2500BEVT 250Gb blue


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 7:01 am 
Offline
Patron of SPCR

Joined: Thu Apr 10, 2003 1:56 am
Posts: 316
I've had a lot of simular problems too. Just a question, is your cpu running overclocked or have you adjusted the memory timings manually? And do you have the problems with all cores or only gromacs?

The gromacs core is VERY sensitive. I had my system overclocked to the limit while still being really 100% stable. But every gromacs core I got went wrong. After running a few hours the frame time would suddenly go from 10 mins to 2 hours, or a frame would not finish at all anymore. After restarting, it would either give the work_files_missing error or simply an early_unit_end, and it would download a new wu. After looking around on the FAH forums I reset my memory timings to default and throttled my cpu just a tad back. Since then all my problems are gone. (well, at least when gromacs are concerned :))

_________________
XP 2800@2270mhz+SLK800A+Nexus fan, Aopen 8RDA+@414mhz, Samsung 80gb on foam, XFX6800GT with a Nexus 120mm fan taped onto it, Nexus 4090 PSU, Chieftec Dragon+Nexus fan@1200rpm


Last edited by Wrah on Wed May 07, 2003 7:16 am, edited 1 time in total.

Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 7:15 am 
Offline
*Lifetime Patron*

Joined: Sun Mar 09, 2003 12:27 pm
Posts: 1465
Location: Reading.England.EU
No overclocking: that box is a Thunderbird 1.4 and I never could overclock it. Also memory timings (PC133 SDRAM) are nothing adventurous: 2 sticks of Crucial CL2 running at standard SPD settings.

I have to admit I undervolt that CPU: spec is 1.75V and I often set the jumpers down so MBM reports 1.70V. Have never traced any problem to that, but coincidentally set to stock VCore yesterday before the log I posted so cant be that. (Was trying to debug a hang in The SIMS, played by some of my clan: turns out it was a corrupt family :D )

Also should have mentioned I had the first Deleted units when running the Graphical Client version.

Also should have mentioned that I have disabled hdd power down (figured maybe stuff was getting lost in hdd cache) with no effect.

Also should have mentioned that my other 2 boxes (1 of which is overclocked mildly - XP2100 to XP2700) have never deleted a unit. So I can setup OK sometimes!

Am convinced it is something to do with the box but am stumped.

_________________
2009/Oct: Jetway JNC81-LF * 4850e naked under fanless Xigmatek Apache * Antec mini Skeleton w/Nexus 120mm PWM fan * Delta 90W brick w/Skeleton DC-DC board * WD2500BEVT 250Gb blue


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 7:16 am 
Offline

Joined: Fri Apr 04, 2003 6:35 am
Posts: 539
Location: Cambridgeshire, England
It might be a bug with the Fahcore_xx.exe?
I noticed when I got dealt a heavy WU (1245 steps!) that my client stopped recording at 21% done (let it work all night and no progress). I noticed in the logfile that it was using FahCore_78.exe and I thought "hey that's new!". So I gave it until monday and when it still didn't finish I just deleted the whole install and reinstalled the client from scratch. Now it's using FahCore_65.exe and there's no problem now.

Might have been a corrupted core? Maybe a bug?

_________________
Sonata (foamed)Bezel mod
Seasonic S12-330, 2x120mm Yate Loon D12SL-12@1000RPM, P4-2.6C w/PAL8942 80mm L1A@10v, 1GB Mushkin DDR400, ATI 9800Pro 128MB w/ZM80A, Samsung SATA 160Gb


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 7:24 am 
Offline
*Lifetime Patron*

Joined: Sun Mar 09, 2003 12:27 pm
Posts: 1465
Location: Reading.England.EU
Wrah wrote:
And do you have the problems with all cores or only gromacs?
With all cores. Looking through the 10 unit record

I have a deleted Genome, Tinker & 2 Gromacs.

The deleted Tinker was Project 632, I also have a successful Tinker for Project 632.

The deleted Gromacs are Projects 540 & 537, while Projects 670 & 542 completed OK.

Based on your o/c problem I may disable some of the silent options on that system for a few days to see if it helps. CPU (or at least diode in socket) is only running 43C @1.75V but let me see if I can blast it into mid 30s for a while.

_________________
2009/Oct: Jetway JNC81-LF * 4850e naked under fanless Xigmatek Apache * Antec mini Skeleton w/Nexus 120mm PWM fan * Delta 90W brick w/Skeleton DC-DC board * WD2500BEVT 250Gb blue


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 7:39 am 
Offline
Patron of SPCR

Joined: Thu Apr 10, 2003 1:56 am
Posts: 316
The o/c problem I read about was specific to though gromacs core though.
Or you can run your system with some very safe settings, something like fsb back to 100. At least to eliminate one of the hardware components being a possible cause.
You can also try looking around on the F@H forum.

_________________
XP 2800@2270mhz+SLK800A+Nexus fan, Aopen 8RDA+@414mhz, Samsung 80gb on foam, XFX6800GT with a Nexus 120mm fan taped onto it, Nexus 4090 PSU, Chieftec Dragon+Nexus fan@1200rpm


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 7:43 am 
Offline
Patron of SPCR

Joined: Thu Apr 10, 2003 1:56 am
Posts: 316
Just did a bit of reading there (sigh, should actually be working, but can't resist the f@h addiction), and it seems they just got new versions of some cores out with bugfixes. If you delete the core exe files from the folding@home directory, you will force it to download the new versions. See if that helps.

_________________
XP 2800@2270mhz+SLK800A+Nexus fan, Aopen 8RDA+@414mhz, Samsung 80gb on foam, XFX6800GT with a Nexus 120mm fan taped onto it, Nexus 4090 PSU, Chieftec Dragon+Nexus fan@1200rpm


Top
 Profile  
 
 Post subject:
PostPosted: Wed May 07, 2003 8:09 am 
Offline
*Lifetime Patron*

Joined: Sun Mar 09, 2003 12:27 pm
Posts: 1465
Location: Reading.England.EU
Wrah wrote:
Just did a bit of reading there (sigh, should actually be working, but can't resist the f@h addiction), and it seems they just got new versions of some cores out with bugfixes. If you delete the core exe files from the folding@home directory, you will force it to download the new versions. See if that helps.
Thanks - and to Mr Smartepants. Have decided to go for the D&C when it finishes the massive 6 point WU it is currently on.

_________________
2009/Oct: Jetway JNC81-LF * 4850e naked under fanless Xigmatek Apache * Antec mini Skeleton w/Nexus 120mm PWM fan * Delta 90W brick w/Skeleton DC-DC board * WD2500BEVT 250Gb blue


Top
 Profile  
 
 Post subject: Calling Ghostbusters
PostPosted: Fri May 09, 2003 3:47 am 
Offline
*Lifetime Patron*

Joined: Sun Mar 09, 2003 12:27 pm
Posts: 1465
Location: Reading.England.EU
OK, so I uninstalled the graphical client version, deleted the corexx.exe as well as the log. Everything was looking great: figured that it was a bit early (only 1.5 WU) to assume & post success.

Then decided to have a look at my time per frame: occasionally EM has got excited that a frame was taking too long, but it was recovering automatically. Extracted the log into Excel and calc'd the time per frame - sure as hell the occasional frame was taking abnormally long. e,g. on current WU (Protein 385, p540_BBA5_N in water) the usual time per frame is 13 minutes 20-30 seconds. The abnormal frames are 24 to 27 minutes. Then noticed that in fact yesterday afternoon the log had stopped at frame 301 (on a previous Tinker) and restarted at frame 310: the machine had hung while the family were playing and had to be rebooted.

Then noticed that the 'slow' frames are regularly every 2 hours, and yesterday's hang/reboot was on a 2 hour boundary. (The 2 hours is offset from the time F@H starts, not clock time. So before the hang the slow frames were 12h16, 14h16, 16h16 with reboot at 18h16, since the reboot the slow frames are 01h20, 03h30 etc.)

So tried to 'watch' the system coming up to a 'slow' frame, and sure as hell the hdd suddenly starts thrashing. Move the mouse, it stops. After a minute or so thrashing restarts, check task manager whatever it stops, and the frame completes in reasonable time.

Now this hdd thrashing is a symptom I have tried to hit before: it predates my F@H enrollment. In fact I noticed it some weeks ago and have rebuilt that system (from formatted hdd) because I figured it was some trash that had been picked up and missed by the virus scanner. Still there. My current suspects are stuff loaded from the registry at boot for Office XP Professional (a full/all components load)
MDM.exe (Machine Debug Manager) or
MOSearch.exe (Microsoft Office Search Service)

Any ideas/thoughts welcome: I am going to kill these processes to test if they are the cause, then try deinstall the 'full' Office XP: previously I removed MDM from the registry but seems next time we fired Word or some such it got put back!

_________________
2009/Oct: Jetway JNC81-LF * 4850e naked under fanless Xigmatek Apache * Antec mini Skeleton w/Nexus 120mm PWM fan * Delta 90W brick w/Skeleton DC-DC board * WD2500BEVT 250Gb blue


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC - 8 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group