dhanson865 wrote:
That's a lot of stop signs / red lights. It's as if the light is red 50% of the time. The ram could be twice as fast and still not be as fast as the DDR5 used on a discrete card.
Rough numbers
DDR3 1333 20 GB/s
DDR3 1600 25 GB/s
DDR3 1866 30 GB/s
---DDR3 2000 32 GB/s
DDR3 2133 34 GB/s
Once again, this all makes perfect sense. The bandwidth increases by 20%. The frequency (ie instruction delay) also increases 20% (again, assuming the same per-clock timings).
My issue is this:
All of those "red lights" should resolve
at best 20% faster. There is some small overhead in dealing with a saturated bus (to put it in thread parlance, you have to clear the pipeline, swap the registers, stack considerations, etc). But in real-world computing, the time dedicated to a given operation on a bus (or in terms of time-multiplexed channel schemes, the time allotted to "reserving a spot", which is really not even needed in the two controller case), is relatively small compared to the throughput time (ie a 5-hour stop light cycle would make the start/stop efficiencies irrelevant).
This breaks down if the engine is written in a way that very frequently forces the controller to change pages (ie every GPU operation depends on CPU result, then the reverse, which would obviously bottleneck discrete graphics setups as well). The cases where you might see a significant decrease in overhead would require the channel to be non-saturated, which would decrease the amount of bandwidth-related performance increase you would expect, and would (in almost any real-world case, though obviously not all, as the test shows) drop you below a 20% increase (because you went from saturated to unsaturated states, which increases efficiency, but not overall throughput).
The thing that is bothering me is that we are getting a better-than-ideal performance increase. If I told you I could double the efficiency of your car's engine, and then you get three times the mileage, something is up.
EDIT: Hmm, one possibility that occurred to me: I assumed that the same operations were performed on a per-frame basis. This is quite possibly not the case? IIRC, many games have a server frame rate (state rate?) which is fixed, so perhaps the total number of operations required per frame is not fixed. This would explain the performance increase.