C2D and motherboards for number crunching?

All about them.

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

C2D and motherboards for number crunching?

Post by aqm consultant » Wed Nov 08, 2006 7:08 pm

I'm at it again --
Built an AMD x2 4800+ box in the Spring (Asus w/passive NB, stock cpu HSF, passive graphics, P150 w/120mm Nexus in the back and 2 92mm in the drive bays, Q-fan enabled). Did have to RMA the S0511... Neo HE 430 to get one that wouldn't shut down at full load. It's currently running at 100% CPU on both cores 2 feet away with the door off (started about 12 hours ago), and I can't hear it. :D But I need more CPU clock ticks. . .

I'm debating various C2D options (including the quad core that's to be released on Nov 14), but am getting very confused about motherboards and BIOS issues. Are those with the 975x NB the only ones that can monitor temperature directly from the core for fan control etc.? If not, what are good options?

Also, CPU HSF options? Looks like the HR-01 would be good, assuming the duct will clear a heatpipe cooler on the mobo in a P150 (yeah, I don't mind it looks like a 1950's refrigerator).

I'm partial to Asus boards (after about 5 or 6), but their 975x has at least some reported BIOS glitches (drive recognition, . . .).

TIA

roystarman
Posts: 12
Joined: Sat Oct 28, 2006 3:44 am

number crunching

Post by roystarman » Thu Nov 09, 2006 2:09 am

I think C2D is the way to go for now, until the quads ht the street. I am not convinced that the 975 chip set is better than the 965. The biggest difference appears to be the ability to support IDE and the new southbridge ICH8/R. The 965 is newer and does not support the IDE but does support the ICH8. The 975 doesn't. Anandtech recently reviewed several 965 boards. The ASUS, ABIT and GIGABYTE solutions appear to be the best of the lot. These were "mid"priced value boards about $150 US and they all compared pretty equally. C2D seem to be very good overclockers. I would get the 6600 version since it is the first one to have large seperate caches for each processor.[/code]

NeoNSX
Posts: 80
Joined: Sun Sep 10, 2006 5:04 am

Post by NeoNSX » Fri Nov 10, 2006 4:48 am

As an Asus P5W DH (975x) owner, the road to Asus producing a good BIOS for the P5W has been a loooooong one, but the latest 1503 is pretty good. I've also had extensive experience with the P5B Deluxe and it's probably slightly better than the P5W-DH: cheaper to boot too. While i'm not aware of the P5W having drive recognition issues, most P5W owners are aware it's picky about the RAM you use... this is the most annoying part of owning a P5W. :(

While Quad Core sounds great on paper, it's going to COST you a kidney on release. You may need to sell any children you have too to raise the money. "Cheaper" Quad CPU's probably won't be on the market until mid-2007. So if you really want cycles, i'd go with roystarman's suggestion: buy an E6600 but then overclock to 3GHz. That's about the sweet spot at the moment without going into extreme OC'ing.

Out of curiousity, specifically what are you doing that requires more cpu clock cycles? 3d rendering/cad/video editing/playing minesweeper?

Shining Arcanine
Friend of SPCR
Posts: 502
Joined: Sat Oct 23, 2004 2:02 pm

Post by Shining Arcanine » Fri Nov 10, 2006 9:00 am

NeoNSX wrote:While Quad Core sounds great on paper, it's going to COST you a kidney on release. You may need to sell any children you have too to raise the money. "Cheaper" Quad CPU's probably won't be on the market until mid-2007. So if you really want cycles, i'd go with roystarman's suggestion: buy an E6600 but then overclock to 3GHz. That's about the sweet spot at the moment without going into extreme OC'ing.
http://techreport.com/forums/viewtopic.php?t=45325

According to that thread at TechReport.com and another thread at XtremeSystems.org, the Xeon 32x0 series is pin compatible with socket 775 desktop motherboards. The two CPUs in the series both cost less and consume less power (thanks to lower voltage operation) than the QX6800 that is going to be released on November 14. There is one person at xtremesystems.org running a Xeon 3220 in a P5B Deluxe if I recall correctly.

Aqm consultant might want to wait until the Xeon 3210 is available. It should be overclockable so any additional performance that he would get from the QX6800 he could get by overclocking. Given that he is considering a QX6800, the approximate $690 price tag should not be too hard a pill to swallow.

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Fri Nov 10, 2006 12:02 pm

Thanks guys. I wasn't aware of the Xeons, but looks like I'd have to wait till January. Sounds like maybe I should consider a P5B and an E6600 or E6700 for now, and then upgrade to a quad core in a few months, since apparently there'd be no need to replace the P5B. 'Course I'd then have the joy of finding out if WinXP's security (thanks Bill) will refuse to work thinking it's now on a different machine. :roll:

Any recommendations for RAM that will use the full bandwidth before and after upgrade? Is this like the A8N-SLI where populating all four DDR slots requires slowing the RAM access?

NeoNSX: What I'm running is numerical simulation models of physical and chemical transport and transformations in the environment. LOTS of floating point math (including trig and other transcendental functions) for lots of variables in each of lots of time steps. Fair amount of memory access, but very little I/O. 24 hours or more on both cores of the x2 at 100% (splitting a single simulation into two halves) is not uncommon.

[edited]

Shining Arcanine
Friend of SPCR
Posts: 502
Joined: Sat Oct 23, 2004 2:02 pm

Post by Shining Arcanine » Fri Nov 10, 2006 2:07 pm

A target price range probably would help, but since you are talking about buying an E6600 or E6700 and then upgrading to Quad Core after that, my recommendation is pretty much any matched pair of memory from Corsair's XMS2 Dominator series; here is the one that is currently the cheapest:

http://www.zipzoomfly.com/jsp/ProductDe ... de=85026-4

Memory from Corsair's XMS2 Dominator Series should be up to date as long as you use DDR2. There is currently a shortage of the micron ICs used in them, so it might be a few months before they become available.

soulwarrior
Posts: 4
Joined: Sun Nov 12, 2006 2:09 pm

Post by soulwarrior » Sun Nov 12, 2006 2:26 pm

aqm consultant wrote: NeoNSX: What I'm running is numerical simulation models of physical and chemical transport and transformations in the environment. LOTS of floating point math (including trig and other transcendental functions) for lots of variables in each of lots of time steps. Fair amount of memory access, but very little I/O. 24 hours or more on both cores of the x2 at 100% (splitting a single simulation into two halves) is not uncommon.
I just wonder if the core 2 duo is really better at number crunching? Found a benchmark which aims to test cpu's in this area: http://r-goetz.de/RGBench/bench.shtm. There the core 2 duo didn't perform as well as the athlon's. Even in the the sciencemark benchmark, the x2 performs quite well: http://www.extremetech.com/article2/0,1 ... 652,00.asp
Looking myself at which cpu to buy mostly for phylogenetic calculations. So I ask myself if the core 2 duo would really be better suited.
ps: I am a biologist, so I am dependent on the expertise of informaticiens in this area ;)

EsaT
Posts: 473
Joined: Sun Aug 13, 2006 1:53 am
Location: 61.6° N, 29.5° E - Finland

Post by EsaT » Mon Nov 13, 2006 11:02 am

soulwarrior wrote:I just wonder if the core 2 duo is really better at number crunching? Found a benchmark which aims to test cpu's in this area: http://r-goetz.de/RGBench/bench.shtm. There the core 2 duo didn't perform as well as the athlon's.
Like there wouldn't be enough synthetic benchmarks... which can't even run on multiple cores.
Even in the the sciencemark benchmark, the x2 performs quite well: http://www.extremetech.com/article2/0,1 ... 652,00.asp
Looking myself at which cpu to buy mostly for phylogenetic calculations. So I ask myself if the core 2 duo would really be better suited.
ps: I am a biologist, so I am dependent on the expertise of informaticiens in this area ;)
As general rule never, ever trust results of only one reviewer in these choises, so always compare results of multiple reviews for similar behaviour.

Now Sciencemark is actually one of those rare areas where AMD's architecture performs slightly faster... But Sciencemark definitely doesn't utilize newer SSE extensions so with optimizing results might be different.
Also is same behaviour visible in real world applications?

3D rendering is one of the most number crunching intensive applications and there Conroe wins...
Especially in Lightwave rendering test which propably uses well optimized code utilizing SSE extensions AMD64 gets literally crushed by Conroe. If you would look just that result AMD64 would be totally inferior CPU.

So until AMD gets out something new I wouldn't see any reason for getting AMD processor if highest performance is priority, especially considering how far affordably priced Conroes (like E6600) can be overclocked.



As for scientific well parallelizing computing GPUs are propably soon taking over that job from CPUs like in Folding@Home, especially now with DX10 GPUs being designed equally much towards general parallel processing as graphics.

Tzupy
*Lifetime Patron*
Posts: 1561
Joined: Wed Jan 12, 2005 10:47 am
Location: Bucharest, Romania

Post by Tzupy » Mon Nov 13, 2006 11:09 am

From my experience, my new C2D E6600 is not significantly faster than my previous Winnie 3200 when running x87 single-threaded code.

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Mon Nov 13, 2006 2:52 pm

While the range of benchmarks with modern code optimized for current processors indicate a clear edge for Conroe, it’s less clear that this can be extrapolated to legacy code. In my case, the code for various models (over 50,000 lines in one case) has a regulatory history, and any significant modifications would require extensive testing and documentation to demonstrate that the code revisions don’t alter the results under each of a large set of options and inputs.

FWIW, preliminary benchmarking (carrying a stripped down simulation into local computer stores w/Conroe and x2 boxes) shows the x2’s will likely hold their own very nicely. As Tzupy says, single thread apps don't speed up much on these, but they only use half of the cpu. I tend to run two instances at a time, with processor affinity assigned to one core or the other, and with priority dropped to "below normal" so that the machine doesn't lock out other processes. For benchmarking, I set up two batch jobs and run both simultaneously in this way, giving 100% CPU on both cores. Preliminary results show the following speeds relative to my x2 4800+ box:

E6300 @ 1866 MHz – 0.70
x2 4200+ @ 2204 MHz – 0.91
x2 5000+ @ 2505 MHz – 1.08

I’ve not yet found an E6400 or higher to check. Assuming speed is roughly proportional to the CPU clock (not perfect, but may be close), an EX6800 Conroe at 2930 MHz would come in at 1.09 relative to the 4800+. Hardly worth the price without some serious overclocking.

There are two confounding factors I’ve yet to resolve. First is whether the 4Mb cache on the E6600 and higher will speed things, and whether the sharing of the L2 cache affects anything. Also, I suspect the x2's on-chip memory manager gives it an edge, but again, can't tell without testing. Second, there is about 16% overhead in setting up the benchmark (loading inputs into arrays prior to initiating the simulation). FSB speed and HDD controllers as well as on-chip cache could have some effect on how much this varies between CPUs. I’m revising the benchmarking system to identify overhead separately. Will report back on this and other Conroes as I find them.

OmegaEntropy
Posts: 18
Joined: Tue Sep 26, 2006 3:31 pm

Post by OmegaEntropy » Mon Nov 13, 2006 7:45 pm

If you're doing number crunching where errors in computation matter, I'd assume you're:
1) not overclocking
2) using ECC RAM.

If the code has regulatory issues attached to it, I hope the above is the case (I work in the medical device field so I have some regulatory experience).

So, that said, can you take advantage of 4 cores? If you know the relative performance of each core/platform (A64 vs Core) and the scaling with MHz (and potentially scaling with cores/die if memory bandwidth is an issue), it seems just a matter of doing the math to determine what gives you the best performance. As all out performance seems more important than $ here (or performance/$), do not forget 2 socket systems or AMD's upcoming 4x4 platform (2 weeks?).

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Mon Nov 13, 2006 11:57 pm

I've never overclocked, but the output is spatial (2D) and RAM errors that might be prevented by ECC would be painfully obvious (generating reruns). Multiple sensitivity runs and graphical evaluation are standard. I've looked at dual socket options a bit, but the price (given that multiple runs and evaluation are standard) is hard to justify.

Is 4x4 really weeks away?

lm
Friend of SPCR
Posts: 1251
Joined: Wed Dec 17, 2003 6:14 am
Location: Finland

Post by lm » Tue Nov 14, 2006 3:47 am

NVIDIA is making it possible to do general purpose calculation on their gpu. Should give great performance on floating point stuff.

soulwarrior
Posts: 4
Joined: Sun Nov 12, 2006 2:09 pm

Post by soulwarrior » Tue Nov 14, 2006 5:44 am

lm wrote:NVIDIA is making it possible to do general purpose calculation on their gpu. Should give great performance on floating point stuff.
Same for ATI: http://ati.amd.com/companyinfo/events/S ... index.html

jaganath
Posts: 5085
Joined: Tue Sep 20, 2005 6:55 am
Location: UK

Post by jaganath » Tue Nov 14, 2006 6:27 am

Is 4x4 really weeks away?
apparently it's being launched today:

http://www.engadget.com/2006/11/10/amds ... bout-1000/
Over the last few months, we've enjoyed snippets of information about AMD's new 4x4 platform, but until today we didn't know the speed of the chips, nor their prices. TG Daily just got all the info from a company event in Munich; the 4x4 will launch on November 14, with three different models of 125-watt FX processors: the 2.6GHz FX-72, 2.8GHz FX-74, and 3.0GHz FX-76.

Mariner
Friend of SPCR
Posts: 260
Joined: Thu Jan 06, 2005 11:25 am

Post by Mariner » Tue Nov 14, 2006 9:17 am

soulwarrior wrote:
lm wrote:NVIDIA is making it possible to do general purpose calculation on their gpu. Should give great performance on floating point stuff.
Same for ATI: http://ati.amd.com/companyinfo/events/S ... index.html
Yep, but the new G80 from NVidia is an absolute beast even in comparison to the R580 from ATI and is much more flexible. The G80 will offer ridiculous levels of general purpose processing power. No more than you'd expect from a chip with almost 700 million transistors though!

I don't doubt that the forthcoming R600 from ATI will offer pretty incredible performance in this field also.

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Tue Nov 14, 2006 4:03 pm

The stream computing option looks interesting if I can figure a way pass calculations for non-conflicting array locations efficiently. Looks like the NVIDIA CG Toolkit might make this doable. However, I wonder if the processing advantage won't be wiped out by the need to pass all the data over the PCI bus. Haven't found anything on the web other than general assertions that the potential of the approach is huge. Obviously, the larger the chunks of processing you can pass over per array chunk, the better off you are.

Anybody seen any real-world results? I'd hate to cough up $700 for a video card and spend a week programming, only to find that there's no improvement.

NeoNSX
Posts: 80
Joined: Sun Sep 10, 2006 5:04 am

Post by NeoNSX » Tue Nov 14, 2006 7:41 pm

aqm consultant wrote:I'd hate to cough up $700 for a video card and spend a week programming, only to find that there's no improvement.
if you bought a G80, you wouldn't spend a week programming... you'd be HARDCORE GAMING!!! :lol: :P

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Wed Nov 15, 2006 12:06 am

Sorry. Not me. I don't even have any games.
My kids do, and they're cool, but this is business. . . ..

cycleback
Posts: 22
Joined: Mon Sep 26, 2005 2:54 pm

Post by cycleback » Wed Nov 15, 2006 12:20 am

Are you doing finite element calculations or something similar? I have been facing a dilemma over the C2D or x2 is a better choice also and I have been unable to find some good benchmarks for these types of simulations.

jaganath
Posts: 5085
Joined: Tue Sep 20, 2005 6:55 am
Location: UK

Post by jaganath » Wed Nov 15, 2006 12:51 am

Anybody seen any real-world results? I'd hate to cough up $700 for a video card and spend a week programming, only to find that there's no improvement.
Folding@Home uses stream computing, they're seeing a 20-40X improvement.

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Fri Nov 17, 2006 3:08 pm

cycleback wrote:Are you doing finite element calculations or something similar? I have been facing a dilemma over the C2D or x2 is a better choice also and I have been unable to find some good benchmarks for these types of simulations.
The math isn't that exotic -- deterministic calculations (trig, exponentials, etc.) operating on large arrays (typically ~2000 x 10 and 4000 x 4 single and double precision) that don't vary within a single time step that produce incremental changes in large output arrrays (~4000 x 20 double precision), all of which are held in RAM. I can parallelize an inner loop in a way that I think would minimize the amount of traffic over the PCI bus in stream computing, but "minimizing" is relative. A lot of stuff still has to move back and forth.

I modified the benchmarking approach mentioned above so I could identify separately the setup overhead from the actual processing time. Didn't rerun the x2 4200+, but confirmed the previous results relative to my x2 4800+ (S939):

E6300 -- 0.68
x2 5000+ -- 1.08

Still looking for an E6600 or higher to test to see if the larger L2 cache helps. If it doesn't, then for this work, C2Ds lose the battle. I've got a friend with a Mac G5 dual Xeon who'll test on that. Might give some insights.

jaganath
Posts: 5085
Joined: Tue Sep 20, 2005 6:55 am
Location: UK

Post by jaganath » Fri Nov 17, 2006 3:38 pm

Still looking for an E6600 or higher to test to see if the larger L2 cache helps. If it doesn't, then for this work, C2Ds lose the battle.
C2D's OC well, if overclocking is an option for this task.

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Fri Nov 17, 2006 4:50 pm

jaganath wrote:C2D's OC well, if overclocking is an option for this task.
So it's reported. With the EX6800 priced at about 2.5x an x2 5200+, it's still pretty hard to justify. Maybe see if an E6600 can get to 3.5GHz? Without a liquid nitrogen cooler? :roll:

jaganath
Posts: 5085
Joined: Tue Sep 20, 2005 6:55 am
Location: UK

Post by jaganath » Fri Nov 17, 2006 5:30 pm

Maybe see if an E6600 can get to 3.5GHz? Without a liquid nitrogen cooler?
This guy got an E6600 to 3.5G using only a CNPS9500:

Anandtech E6600 overclocking thread
For anyone interested, here is documentation of my experiences overclocking the E6600 chip. Full system specs:

Asus P5B Wifi/Deluxe mainboard, BIOS 507, included chipset fan installed
Intel Core 2 Duo E6600 Retail (B2 revision, stepping 6)
2 GB (2x 1 GB) G.Skill DDR2-800 CL4 RAM
Enermax Liberty 500W PSU
ATI X1900XT 512 MB graphics card
2x WD Raptor 150 GB's in RAID-0
4x Seagate 7200.10 320 GB's in RAID-5
NEC 3550a burner
Creative X-Fi Platinum
Zalman CPS9500 AT w/ Arctic Silver 5
4x 120 mm case fan
Antec P180b case

...
so because of 1 or more of these things, the system now acheives a much better overclock. I have found that:

at 3.5 GHz (437x8) the system is prime95 stable.
Also this guy:

http://www.nindeals.com.au/showthread.php?t=1199

3.5G w/ a Thermalright SI-120.

cmthomson
Posts: 1266
Joined: Sun Oct 09, 2005 8:35 am
Location: Pleasanton, CA

Post by cmthomson » Fri Nov 17, 2006 5:53 pm

Running an E6600 at 3.2GHz is trivial. With a Ninja and any old fan, 3.3 is easy. Lots of folks willing to put up with relatively loud fans get 3.4-3.6 Ghz. Liquid or phase-change cooling isn't necessary with Conroe.

Tzupy
*Lifetime Patron*
Posts: 1561
Joined: Wed Jan 12, 2005 10:47 am
Location: Bucharest, Romania

Post by Tzupy » Mon Nov 20, 2006 10:10 am

If your floating point calculations have a lot of sin, cos, exp, then probably the AMDs are a better choice if they are executed as x87 instructions.
However, if the compiler approximates those instructions with several faster ones, especially with SSE2 ones, then you should get a big boost from C2D.
BTW the arrays are not so large (my stuff sometimes uses 10x-20x that data). From my experience, if the code has a lot of trig functions, the cache size doesn't matter.

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Tue Dec 05, 2006 12:45 pm

I solicited benchmarking volunteers on another thread (viewtopic.php?t=36291). Thanks to pyogenes for running a number of tests on an E6600 at both stock, over- and underclocked speeds, as well as varying FSB and memory speeds. Neither the FSB or memory speeds made any difference, and run times scale exactly inversely with CPU clock speed for an E6300. His E6600 OC'd to 3.3 GHz was exactly 3.3/1.86 times faster than the E6300. This means that for this application, the L2 cache has no effect on Intel performance. Still hoping to find an x2 5200+ and/or FX60 or FX62 volunteer to see whether the L2 cache size helps there. Short version is that on a price/performance basis, an OC'd E6300 is likely to be the sweet spot if you're willing to OC. If not, or if low idle power consumption is important, the x2 is probably the way to go. Not that it means much, but on a "throughput per CPU cycle" basis, the AMDs are about 22% faster than C2Ds on this kind of math.

Again, thanks to pyogenes!

aqm consultant
Posts: 88
Joined: Mon Jun 20, 2005 1:11 pm
Location: California

Post by aqm consultant » Mon Dec 18, 2006 2:08 pm

One more data point. I broke down and built an Asus M2N-SLI Deluxe with AMD x2 5200+. Benchmark times are identical to the x2 5000+ (same CPU clock speed, but half the L2 cache of the 5200). Conclusion remains the same -- for this code, it's strictly CPU cycles.

System info if you're interested since everything went together easily with no problems:
Antec P150 w/ Neo HE 430 (SN S0604xxxxx, April 2006 version)
M2N-SLI Deluxe
AMD x2 5200+ w/stock HSF
Samsung SP2504C 250 Gb SATA2
2x1Gb OCZ Platinum Rev 2
Asus EN7600GS Silent 512Mb
120 mm Nexus replacing Antec Tricool
2x92 mm Nexus fans on the front.

Asus Probe says the CPU hits 61-62 degrees C with both cores fully loaded with CPUBurn with CnQ set to "Optimal" (CPU fan at ~2600 RPM). Not too bad on noise, but I think the stock solid aluminum HSF (small w/no heat pipes) is about to get replaced by a Scythe Mine . . . .

Post Reply