The idea is to allocate threads to cores in the order 0, 2, 4, 6, 1, 3, 5, 7. That way when you're running four threads or less, each thread has exclusive access to all the resources in a module (such as the front-end and L2) instead of having to share them. Turbo frequency is something the scheduler needs to consider as well. The scheduler in Win 8 is module aware and looks like it will offer some pretty solid improvements in single/lightly threaded tasks by more optimally distributing threads to Bulldozer modules than Win 7 does. Tom's Hardware saw between 8-12% performance improvement in WoW, depending on resolution (
link). AMD also had a slide that showed between 2-10% improvement in gaming performance with Win 8 (
link). The most CPU limited of the games AMD tested, Left 4 Dead 2, saw a 10% improvement in FPS. The more GPU limited games saw a modest improvement in performance, between 2-4%. But this is very promising, because single/lightly threaded performance was one of the areas BD really fell short of my expectations (that and power consumption). That's why I won't bother with Bulldozer now, but if new steppings (not talking about Piledriver, but B3 revision silicon) can reign in on power consumption a bit and software optimizations can improve performance a fair amount, I think it will be a much better option.
Also worth noting is that HyperThreading has the exact same limitations. For example, with a HyperThreaded CPU, for optimal performance you would not want to send one thread to a physical core and then another thread to the logical core for that same physical core. This would lead to a situation where the two threads are fighting for resources within the core. In theory the performance hit would be greater than Bulldozer, which duplicates a lot of resources within a module, while very little is duplicated with HT. The optimal scheduling is to send each thread to a physical core whenever possible (in the case of a quad core w/HT, when you're running four or less threads). All SMT schemes have this pitfall, whether it's HyperThreading or the module approach. That's why the scheduler needs to be aware of logical cores in HyperThreading just like it needs to be aware of modules for optimal performance. The difference is that Windows has been HyperThreading aware since XP I believe, but the module approach hasn't been around for nearly a decade and schedulers and other software haven't been optimized for it. But once they are it will help.