Affinity Changer - run two F@H SMP clients on a quad core!

A forum just for SPCR's folding team... by request.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

cd8uk
Patron of SPCR
Posts: 47
Joined: Wed Apr 18, 2007 12:55 pm

Affinity Changer - run two F@H SMP clients on a quad core!

Post by cd8uk » Mon Dec 03, 2007 2:49 am

Per my slight hijacking of another thread here
I found this little program http://distributed.org.ua/forum/index.p ... topic=1149 which I've been testing out on XP 32 and Vista 64. So long as you have a quad core it enables you to run two simultaneous instances of F@H SMP and, somehow, the average PPD increases by up to ~40%. My XP 32 Q6600 @ 2.66GHz has gone from ~2,400 PPD to 2 * 1,600 PPD using the 2653 project. My new G0 Vista 64 Q6600 is showing similar gains but Vista seems a little flakier with SMP as if you turn the client off it trashes all the checkpoints done and starts again from the beginning!
The only caveat seems to be having at least 2GB RAM.
One other thing is that I have now gone from 1 Quad and 3 Dual core folders to just 2 Quads and not only has my PPD risen (or will when I finish setting up my new server) but my energy usage has dropped!

cd8uk
Patron of SPCR
Posts: 47
Joined: Wed Apr 18, 2007 12:55 pm

Post by cd8uk » Tue Dec 04, 2007 12:50 am

In response to Aristide1's comment here:
:shock: Hey, this info belongs over on the Folding Subject. Also - Do you know anyone else doing this. If I recall correctly some people were running SMP with P4's that have VT, but I don't have an averager PPD for them.
And how many watts are you Q6600's using?
The only people I know doing this are on the original Ukranian thread. I only found that thread by luck through google. I had originally planned to use another option of two VMware Ubuntu64 clients manually assigned to one physical core (2 threads) each in order to 'fool' my machine into running dual instances of F@H SMP but the Affinity Changer utility seemed like a lot less hassle.

As for my Q6600's I have two :D
the first is my gaming rig with a B3 Q6600 @ 2.66GHz @ 1.2v cooled by a not-quite-so-passive-anymore Reserator 2, 3GB RAM, an 8800GTS 320, 74GB Raptor, DVD-RW, Asus P5B-E, XP Pro 32 etc. I don't have current power readings for this but seem to remember it being around 240w (I blame the B3 and 8800GTS) running 2*F@H SMP
the second is my new server which has a G0 Q6600 @ 1.1v @ 2.6GHz, 4*500GB WD, 4*250GB Hitachi, 1*80GB Samsung and 1*40GB 2.5" Toshiba HDD along with 4*1GB DDR2 RAM, DVD-RW, G33M-DS2R with onboard VGA, Vista 64 etc. This uses only ~175w (peeking at ~220w when all the drives spin up) whilst running 2*F@H SMP.

So, to restate, using this utility seems to give a 30-40% performance boost running F@H SMP without signifcantly more power (maybe a couple of watts due to increased RAM usage) or extra money spent.

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Tue Dec 04, 2007 6:33 am

I did not comment initially on this thread because I didn't grasp the implications of what you were saying. My mistake. Now more questions:

1. Where would this product leave the new AMD 3 core processor? 4 threads on 3 cores?

http://www.dailytech.com/AMD+Tricore+Pr ... le9891.htm

2. It seems that better performance (or at least more points 8) ) can be had by making sure each core has 2 threads at it's disposal. This is a given with SMP is run on a dual core. The question I ask then would be of benefit to people with single cores - Could one use this program (or could it be modified to run two single thread instances of FAH on a single core PC? As noted elsewhere I seem to recall people running SMP on P4's that had VT.

3. Two instances of anything are not going to start and finish at the same time, so it seems the processor will stay pretty busy when one WU finishes and upload/download processing. It continues to work on the other WU. This is also good for when the FAH server is down or busy.

4. Does this mean that if I have a server board with 2 quads I could have 4 instances of SMP? 8)

cd8uk
Patron of SPCR
Posts: 47
Joined: Wed Apr 18, 2007 12:55 pm

Post by cd8uk » Tue Dec 04, 2007 7:31 am

Yeah, after posting I did think that I had not made the title clear enough. Should've started 'Free F@H points..." :)

1. Not sure, but I would hope that the creator would do something to fix the 4 threads on 3 cores issue (not all of the four threads are equal anyway - just check the CPU time column in task manager).

2. I doubt that this would work. The P4's with HyperThreading have always been able to trick F@H into running two non-SMP threads so I think that this is a redherring for single core users.

3. I have not been able to test (i.e. be around when a WU finishes) this so can't say for certain. I would hope that the service would notice that only 1 SMP is running (in case of downtime/up-downloading etc.) and so not automatically reassign affinity giving only 50% usage
I must add though that SMP does a bit seem flakier (i.e. shut one SMP down and the other can have a tantrum, restart and trash all checkpoints done to date!) so I would suggest this for deciated folders only.

4. Not sure, but I would hope so and even if not that this would be on the development list. In any case you could always revert to a flavour of my original plan and have 4 linux 64 VMs with two cores assigned each giving 4 simultaneous SMPs.

Sorry that I haven't been able to give a huge amount of extra info but I've only been running simultaneous F@H SMPs for a few days. It would be good to find out other people's experience with this method of folding.

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Wed Dec 12, 2007 5:50 pm

More info on the good work from the Ukraine:

http://foldingforum.org/viewtopic.php?t=136

avi_dan
Posts: 66
Joined: Tue Nov 16, 2004 11:44 am
Location: Seattle, WA, USA

Post by avi_dan » Fri Dec 14, 2007 12:12 pm

OK, so I have one SMP client and the affinity changer installed.

How do I install and run a second SMP instance?

Thanks

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Fri Dec 14, 2007 4:50 pm

CD8UK will have to answer that. :?

avi_dan
Posts: 66
Joined: Tue Nov 16, 2004 11:44 am
Location: Seattle, WA, USA

Post by avi_dan » Mon Dec 17, 2007 10:27 am

Never Mind, I think I got it.

I am getting a 40-50% improvement in two dual proc systems.

Avi

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Mon Dec 17, 2007 3:29 pm

avi_dan wrote:I am getting a 40-50% improvement in two dual proc systems.

Avi
:shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock: :shock:

avi_dan
Posts: 66
Joined: Tue Nov 16, 2004 11:44 am
Location: Seattle, WA, USA

Post by avi_dan » Tue Dec 18, 2007 1:53 pm

Let me rephrase that.

I am currently running two SMP clients on each dual proc machine (HT enabled Xeons) and it is a 40-50% improvement over running one SMP client on the same machine.

The affinity utility helps out a little by making the individual processess not collide with one another in the CPU core.

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Tue Dec 18, 2007 8:35 pm

avi_dan wrote:Let me rephrase that.
I like how it was phrased just fine. 8)
avi_dan wrote:I am currently running two SMP clients on each dual proc machine (HT enabled Xeons) and it is a 40-50% improvement over running one SMP client on the same machine.
For a total of 4 cores?
avi_dan wrote:The affinity utility helps out a little by making the individual processess not collide with one another in the CPU core.
I thought the thermal paste was holding the processors in place. :shock:

cd8uk
Patron of SPCR
Posts: 47
Joined: Wed Apr 18, 2007 12:55 pm

Post by cd8uk » Wed Dec 19, 2007 1:11 am

I'm glad to hear that someone else at SPCR has been able to get dual SMP's running.

For info the only steps you have to do to get the dual SMPs working correctly (after installing Affinity Changer itself) is:
1 make sure that each instance has its own separate folder
2 ensure that instance one has a different machine id to instance two
3 when installing FAH for the second instance say 'no' to the message "uninstall previous version"

I've finally gotten my two Q6600 (1 @ default, 1 @ 3GHz) running properly (you wouldn't belive how much grief a RocketRaid 2300, a G33M-DS2R, Vista 64, Prime95 and a virus scan could cause) and according to FAHMON my PPD should be around the 7k mark. Nice :D

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Wed Dec 19, 2007 6:22 pm

cd8uk wrote:... (you wouldn't belive how much grief a RocketRaid 2300, a G33M-DS2R, Vista 64, Prime95 and a virus scan could cause)
Vista and grief in the same sentence, no nothing hard to understand there.

cd8uk wrote:... and according to FAHMON my PPD should be around the 7k mark. Nice :D
Each? 8)

You wll be bright red soon. The first.

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Thu Dec 20, 2007 5:41 pm

avi_dan wrote:The affinity utility helps out a little by making the individual processess not collide with one another in the CPU core.
1. Should this also not help with dual processors running a single instance of SMP?

2. Should this note help and multi-processors? Period.

avi_dan
Posts: 66
Joined: Tue Nov 16, 2004 11:44 am
Location: Seattle, WA, USA

Post by avi_dan » Fri Dec 21, 2007 11:01 am

aristide1 wrote:
avi_dan wrote:Let me rephrase that.
I like how it was phrased just fine. 8)
Nobdy s prfekt 8)
aristide1 wrote:
avi_dan wrote:I am currently running two SMP clients on each dual proc machine (HT enabled Xeons) and it is a 40-50% improvement over running one SMP client on the same machine.
For a total of 4 cores?
Well each machine has two hyperthreading enabled Xeons. So it appears as 4 cores, but its only two real cores. It still works out ok because of the advantage of a separate CPU over an additional core in the same dye.
aristide1 wrote:
avi_dan wrote:The affinity utility helps out a little by making the individual processess not collide with one another in the CPU core.
I thought the thermal paste was holding the processors in place. :shock:
processess != processors :lol:

avi_dan
Posts: 66
Joined: Tue Nov 16, 2004 11:44 am
Location: Seattle, WA, USA

Post by avi_dan » Fri Dec 21, 2007 11:23 am

aristide1 wrote:
avi_dan wrote:The affinity utility helps out a little by making the individual processess not collide with one another in the CPU core.
1. Should this also not help with dual processors running a single instance of SMP?

2. Should this note help and multi-processors? Period.
The affinity change makes sure that each SMP fah_core process gets assigned to one single core all the time instead of changing to the next available core. This helps out very little, but there should be a gain also when running a single SMP instance. (Even though the author of the utility said that there is no gain in single SMP scenarios)

VanWaGuy
Posts: 299
Joined: Mon Sep 11, 2006 1:01 am
Location: Vancouver Wa USA

Post by VanWaGuy » Sat Dec 22, 2007 10:48 am

Darn it cd8uk,

2 quads? I just recently bought 1, with the illusion that it would put me comfortably at the top of the SPCR folding list (except on days when Bolek goes point crazy :) ). Very nice!

On mine so far, I have basically just thrown the hardware together and started it folding. I am running 2 instances of the SMP client, but still top is showing that I am usually about 20% idle. I thought that folks running only one instance were seeing that sort of idle time, and the second instance ate up most of that.

I have also seen a lot of conflicting advice whether setting processor affinity helps or not under LINUX, but I do have a few days off coming up, and a folding machine to tune.

cd8uk
Patron of SPCR
Posts: 47
Joined: Wed Apr 18, 2007 12:55 pm

Post by cd8uk » Thu Dec 27, 2007 5:30 am

Yeah, 2 Quads. Sorry! I made a decision that it would more efficient for me to have 2 Quads running as opposed to several Duos/X2s. The lovely little Affinity Changer has only helped improve the efficency to the next level.

With regards to CPU usage a single Quad running a single SMP under XP/Vista should definitely run at 100%, whilst with Affinity Changer installed it'll be 50% for 1 SMP and 100% for 2 SMPs.

I'm about to do some testing on an X2 4400+ under XP-32 and Vista-64 (both with and without Affinity Chnager running) and also Ubuntu 64 to see if I can find any siginificant performance variations. I'll post back here in the next few days.

VanWaGuy
Posts: 299
Joined: Mon Sep 11, 2006 1:01 am
Location: Vancouver Wa USA

Post by VanWaGuy » Thu Dec 27, 2007 1:30 pm

I have heard that under Ubuntu you should get more ppd, but it is under Ubuntu where I still have 20% idle even with 2 instances.

GentleGiant
Friend of SPCR
Posts: 34
Joined: Sat Oct 16, 2004 3:21 pm
Location: Socorro, New Mexico, USA

Post by GentleGiant » Thu Dec 27, 2007 8:39 pm

I have heard that under Ubuntu you should get more ppd, but it is under Ubuntu where I still have 20% idle even with 2 instances.
Yup - both true :o My machine is a Dell 2900 with dual Xeon 5140 2.33 dual core cpus. Over the last few days I've been switching OSes

With Windows Server 2003, my CPU utilization is pegged at 100% and my frame time for a p2653 smp unit is 11m50s.

Under Xubuntu 7.10 there's 25-30% idle, but my frame time on p2653 is 9m50s ---- thats 20% faster! Edit: these times are consistant averages over at least 5 units each, so it's not just unit-to-unit variation.

The FaH developers have said MPI under Windows doesn't work as well. I suspect the Windows version has the threads in a busy loop checking to see if the others have completed their parts, but the Mac and Linux versions of the inter-process communication put the threads to sleep, leaving things idle for other processes.

VanWaGuy
Posts: 299
Joined: Mon Sep 11, 2006 1:01 am
Location: Vancouver Wa USA

Post by VanWaGuy » Fri Dec 28, 2007 7:05 pm

The Windows client is also 32-bit. If they released a 64-bit client, it might at least close the performance gap.

VanWaGuy
Posts: 299
Joined: Mon Sep 11, 2006 1:01 am
Location: Vancouver Wa USA

Post by VanWaGuy » Sat Dec 29, 2007 8:07 pm

I experimented today with setting the CPU affinity of the two instances of folding that I am running on my Q6600.

As I said above, I was running about 20% idle before. I did:

taskset -c 0,1 ./FaH for one instance (FaH is a file that contains ./Fah6 -smp -local -forceasm -verbosity 9) and taskset -c 2,3 ./FaH. This sets the first one to use processors 0 and 1, and the other 2 and 3.

After this, my idle is now down to about 15%.

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Sun Dec 30, 2007 7:10 pm

VanWaGuy wrote:I experimented today with setting the CPU affinity of the two instances of folding that I am running on my Q6600.

As I said above, I was running about 20% idle before. I did:

taskset -c 0,1 ./FaH for one instance (FaH is a file that contains ./Fah6 -smp -local -forceasm -verbosity 9) and taskset -c 2,3 ./FaH. This sets the first one to use processors 0 and 1, and the other 2 and 3.

After this, my idle is now down to about 15%.
Just 5%? :?

VanWaGuy
Posts: 299
Joined: Mon Sep 11, 2006 1:01 am
Location: Vancouver Wa USA

Post by VanWaGuy » Mon Dec 31, 2007 12:02 am

Of course I would love almost 0% idle, but I am not going to turn down a free 5%.

GentleGiant
Friend of SPCR
Posts: 34
Joined: Sat Oct 16, 2004 3:21 pm
Location: Socorro, New Mexico, USA

Results vary by WU project

Post by GentleGiant » Mon Dec 31, 2007 2:25 pm

The benefit of running two instances at once varies a lot between the different SMP projects. Some of them leave less free CPU than others.

If both of my WU's are p2653, there's a 25% increase on Pts/day while running two (frame time goes from 9:50 to 15:45). Production goes from 2570 to 3220 PPD

With a p2653 and a p2608 together the increase is less than 10% (frame times are 17:05 and 16:10, from 9:50 or 8:03 alone) - total PPD is just 2720. P2608 by itself leaves just 15% free cpu, rather than the 25-30% p2653 does.

I'd also say frame times and points/day are a better measure than cpu% free - I could add a third instance and free cpu would drop but I'm pretty sure PPD would also drop because of cache and memory system contention.

Getting Pts/day from frame times:
(Unit Pt. Value) / (100 * Time in Seconds / (3600 * 24) )

Frame time 9:52 on a p2653:
(1760 pts) / (100 * 592 secs / (3600 * 24 secs/day) == 2570 pts/day

I haven't yet tried messing with pinning any of the client threads to particular CPU, but I may try it and see. Linux doesn't make it easy to do this, like Windows does with the Task Manager. My guess is the optimal arrangement of the threads for clients A and B might be (A1, B4), (A2, B3), (A3, B2), (A4, B1) --- pairing the threads that use more of the cpu with those that use less.

Any comments?

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Re: Results vary by WU project

Post by aristide1 » Mon Dec 31, 2007 4:38 pm

GentleGiant wrote:If both of my WU's are p2653, there's a 25% increase on Pts/day while running two (frame time goes from 9:50 to 15:45). Production goes from 2570 to 3220 PPD.
Actually when you consider you dropped the cpu count from 4 to 2 the increase in time is not that high, certainly no 50%.

I want to see times on a Phenom. :shock:

Dellguy08
Posts: 2
Joined: Thu Dec 27, 2007 10:35 am

Post by Dellguy08 » Tue Jan 01, 2008 10:57 am

aristide1 wrote:
VanWaGuy wrote:I experimented today with setting the CPU affinity of the two instances of folding that I am running on my Q6600.

As I said above, I was running about 20% idle before. I did:

taskset -c 0,1 ./FaH for one instance (FaH is a file that contains ./Fah6 -smp -local -forceasm -verbosity 9) and taskset -c 2,3 ./FaH. This sets the first one to use processors 0 and 1, and the other 2 and 3.

After this, my idle is now down to about 15%.
Just 5%? :?
Overclocked procs will see more. Look at the results
http://foldingforum.org/viewtopic.php?f=12&t=124

aristide1
*Lifetime Patron*
Posts: 4284
Joined: Fri Apr 04, 2003 6:21 pm
Location: Undisclosed but sober in US

Post by aristide1 » Fri Jan 11, 2008 11:57 am

Dellguy08 wrote:Overclocked procs will see more. Look at the results
http://foldingforum.org/viewtopic.php?f=12&t=124
This was a rather frustrating read. This concern with deadline is of a very narrow viewpoint. A point I wanted to make, but was never addressed, can't be added to the conversation because it's locked. :x

The concern with deadlines is limited in scope to the time elapsed since the time assigned, which is not good. What I am saying here is that there's probably a queue of WUs that need to be completed. Suppose you run WUs 1 through 10 on your quad. You run them consecutively which means they complete one after another. But suppose you run them concurrently, ie in parallel?

1. It's true WU 1 will finish later with 2 cores than with 4 cores.
2. Right off the bat WU 2 starts immediately. That's a huge head start.

When running 2 SMPs each WU will not take twice as long, but more like 40% longer, so:

3. WU 2 will finish sooner than if run consecutively.
4. WU 3 will start a little sooner.
5. WU 4 will start much sooner.

Now stretch this compound savings of time to 100+ and you see WU's completing before they would otherwise have even started, and the longer you do this for the greater the savings. How big of an improvement? 1000 PPD means I finish an additional WU within the same amount of time compared to running one SMP on a quad every 2 days. That means it started and completed before it would have otherwise even started.

The argument of running only 1 SMP per quad to me has almost no credibility.

And on top of that, even my E6400, admittedly running at 3.2 GHz, finishes a WU in less than 24 hours, as would 2 cores in a quad running at the same speed. Actually the quad would be faster because it has a much larger cache than my processor (6MB versus 2MB) .

To penalize people for not completing within the 24 hour preferred window would be to penalize anybody who purchased a slow dual processor. Better to determine which processors (regardless of count) can complete a preferred WU in the preferred time and allot those WUs accordingly.


One other confusing item - If communications that occur over the NB are problematic, it would seem there would be a significantly better pair of processors than can be combined, ie only 1 pairing would be far more optimal, and yet, that's not seen to a large extent.

aliceana
Posts: 1
Joined: Wed Jun 24, 2009 2:39 am
Location: NYC

Post by aliceana » Thu Jun 25, 2009 1:39 am

I need to change my SMPS of my pc because i think it is giving me restart problem.? Any hardware guy there who can suggest me about the SMPS for my AMD Athlon 2800 with 256 MB RAM and MSI motherboard model K8MM-V.I am running XP.

NeilBlanchard
Moderator
Posts: 7681
Joined: Mon Dec 09, 2002 7:11 pm
Location: Maynard, MA, Eaarth
Contact:

Post by NeilBlanchard » Thu Jun 25, 2009 2:19 am

Welcome to SPCR,

Can you explain what SMPS is?

Post Reply