I have read all of the article and the posts on forum about disabling one core in each of the Bulldozer modules, and comparing it in benchmarks with half of the modules disabled - the difference is as expected - it does show some promise for lowly threaded apps, and almost as important (from the technical standpoint) it highlights an area that could increase the BD line of CPU's dramatically in the future if that "problem" was sorted out.
Here is the article in question.
http://techreport.com/articles.x/21865
XtremeSystems thread is interesting.
http://www.xtremesystems.org/forums/sho ... eaded-Perf
And here is an article in French that is also interesting and looks at the same problem with a different set of benchmarks.
http://www.hardware.fr/articles/842-9/e ... e-cmt.html
Converted to English via Google Translate.
http://translate.googleusercontent.com/ ... FCrUfkpGQA
The "problem" appears to be "the scheduler" or possibly the entire "front end" of each module. In a perfect world the scheduler (and other front end components) would allow the FPU and both integer units to be as fast with full load on all 3 computational units simultaneously as each one is individually with no use at all on the other two units - in reality this is not possible, often because of the cache and moving threads around (cache data movements), however as you will see from the previous links there are some serious gains to be made from using one core per module vs two cores per module but half the amount of modules even though the latter employs full turbo-boost and has an obvious clock-speed advantage - this is the point to be highlighted, basically when 2-cores on a module are working hard they are a lot slower than 2cores in 2 different modules.
As you would imagine this is an obvious problem with the design of BD, and is why some people say that the BD design is basically hyper-threading - HT it is not, it has full-blown computational units - but shares the front end, as these tests have proven, the front end is simply not adequate to keep both integer cores (and the FPU as well.?) working efficiently - that much is obvious from all of the tests - but it does look like the BD architecture was "originally designed" to save die space by removing front end components whilst NOT reducing the per-core performance - this is what all of their announcements over the years stated until about 12-months ago - this may be wishful thinking (probably is) but it does stand to reason that there is a problem with the CPU that was encountered that was not expected and has as yet not been overcome.
If AMD closed the gap that has been shown from running a single core per module vs two cores per module by half, that would be an instant gain of 5% when running on only half of the cores, so in theory could get a 10% gain when all 8-cores are active - that is a pretty big gain to be made without making a gigantic change to the design (at least I don't think that it is/should be a massive design change - more of an iron out the bugs, or beef up the front end a little).
There is one other possibility that has popped up a few times (on forums), there might be (not a shred of evidence as yet - only speculation) a colossal cache cock-up that is constantly moving data from the L1 cache of one core in a module to the other core in the same module via the L2 cache - as you can imagine this is hammering the L1 and L2 cache that will affect both cores (and due to the cache abuse the FPU as well.???), this would also is meaning that the cores spend more time waiting for data and less time processing it (whilst still appearing to be running at 100%). There is as yet no evidence - only speculation, if however this turns out to be true, this should also leverage a large performance bonus when it is rectified - and should be doable without a great deal of work on the CPU design itself and could be fixed with a newer stepping (Phenom 1 anybody).
If any of the above is true and AMD remove a digit from their anus and deals with this, BD might still be a contender in performance terms, a price drop will still be needed for price/performance, and it might even make a drop in power consumption possible as well if it does turn out to be a cache-thrashing issue as cache can suck a lot of power (and would explain why there is such a huge difference in power use between idle and load), clock-speeds will still need to go up, and the single threaded performance would hopefully be brought to the level of the PII's - fingers crossed - which overall would make a future BD CPU an attractive option for many people.
Andy