Hardware Reference
In-Depth Information
ridge point moves far to the right, as it's much harder to hit the roof of single-precision per-
formance because it is so much higher. Note that the arithmetic intensity of the kernel is based
on the bytes that go to main memory, not the bytes that go to cache memory. Thus, caching
can change the arithmetic intensity of a kernel on a particular computer, presuming that most
references really go to the cache. The Rooflines help explain the relative performance in this
case study. Note also that this bandwidth is for unit-stride accesses in both architectures. Real
gather-scater addresses that are not coalesced are slower on the GTX 280 and on the Core i7,
as we shall see.
Search WWH ::




Custom Search