Do you know how much boost SSE and advmethods gives you?

A forum just for SPCR's folding team... by request.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
ColdFlame
Posts: 451
Joined: Wed May 21, 2003 9:39 pm
Location: Somewhere in Time

Do you know how much boost SSE and advmethods gives you?

Post by ColdFlame » Tue Nov 25, 2003 1:40 am

I did some experiments on p3 and AMD 2000+ XP and it seems that if SSE and advmethods are not on folding speed is like 2-3 times slower. Did anyone encounter this behavior before?

haysdb
Patron of SPCR
Posts: 2425
Joined: Fri Aug 29, 2003 11:09 pm
Location: Earth

Post by haysdb » Tue Nov 25, 2003 2:15 am

-advmethods only influences what kind of Work Units you are assigned. With it, you will get mostly Gromacs. Without it, you will get a higher percentage of Tinkers. That is what I understand anyway.

How do you turn off SSE? It's not -forceasm, as that's something different.

I haven't done these experiments myself, but if you will tell me what you are doing, I will try the same thing on one of my machines.

David

ColdFlame
Posts: 451
Joined: Wed May 21, 2003 9:39 pm
Location: Somewhere in Time

Post by ColdFlame » Tue Nov 25, 2003 1:54 pm

Here is the sequence of events that led me to believe that those switches affect performance by a lot.

I have AMD 2000+ XP and Pentium 3 1.05 GHz machines. I also have a Celeron 2.4 Ghz. Initially I was running them all with -advmethods and -forceasm switches. Then my AMD started to lock up. Hard lock. I posted here and there and got a reply saying that -forceasm is forcing SSE and AMD+SSE is not a good combination for F@H. So I disabled SSE on AMD.

Immediately I noticed that my AMD 2000 became slower than my 1 Ghz p3. "Weird!" I thought.

Then I tried FireDaemon on my p3 machine and got frame times about 2.5 - 3.0 times slower than running from a command line. Measured by EMIII.

Well, I blaimed FireDaemon first but then I realized that I forgot to specify -advmethods and -forceasm on the service startup parameters. I did specify them and my frame times got back to what they used to be.

Also, I started forcing those parameters on the AMD as well and got performamce that I was expecting to get, about 1.5-2.0 faster than p3 1 GHz.

And -forceasm does force SSE, I tried it myself, people on folding forums told me it does and readme for 4.0 beta client says so.

haysdb
Patron of SPCR
Posts: 2425
Joined: Fri Aug 29, 2003 11:09 pm
Location: Earth

Post by haysdb » Tue Nov 25, 2003 5:50 pm

ColdFlame wrote:-forceasm does force SSE, I tried it myself, people on folding forums told me it does and readme for 4.0 beta client says so.
OK, I will buy that in the scenario you describe, where without the -forceasm option, the assembly language optimizations might not be used. With a P4, the optomizations will be used (AFAIK) regardless of whether -forceasm is specified or not, so it's not precisely an "enable SSE" flag.

Are you running Beta 4? I believe the meaning of -forceasm is different in Beta 4, and there is now an SSE option of some kind.

David

ColdFlame
Posts: 451
Joined: Wed May 21, 2003 9:39 pm
Location: Somewhere in Time

Post by ColdFlame » Tue Nov 25, 2003 6:06 pm

I'm not running 4.0 Beta yet because I don't know if it is Beta1, Beta2 or Alpha :) I might try though because they seem to give partial credits.

The situation with SSE is, as I understand, that they evaluate whether it is "okay" to use SSE on a particular system and they use it if this evaluation succeeds. Now, since there were issues with AMD and SSE I think they disabled SSE on AMD by default.

haysdb
Patron of SPCR
Posts: 2425
Joined: Fri Aug 29, 2003 11:09 pm
Location: Earth

Post by haysdb » Wed Nov 26, 2003 2:15 am

My desktop PC spontaneously rebooted earlier this evening, and after the reboot I noticed the expected completion time for the current WU had more than doubled. I checked the log and saw:

[09:15:57] - Working with standard loops on this execution.
[09:15:57] - Previous termination of core was improper.

Recalling that -forceasm had something to do with the behavior of the client following a crash, I stopped the client, added this switch, and restarted it. The log says:

[10:02:47] - Assembly optimizations manually forced on.
[10:02:47] - Not checking prior termination.

The estimated completion time dropped from 13:42 to 6:22. This supports ColdFlame's original premise, that -forceasm can increase performance by more than a factor of 2. As I pointed out in a recent post, it won't ALWAYS increase performance since the optomizations will sometimes be used without having to force them on, but this is another scenario where it pays to include this switch when firing up the client.

David

Post Reply