cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

cadorino
Journeyman III

Strange global memory bandwidth for HD 7970

Jump to solution

Hi to all,

I run the GlobalMemoryBandwidth sample on a AMD 7970 and the results is the following:

Platform 0 : Advanced Micro Devices, Inc.

Platform found : Advanced Micro Devices, Inc.

Selected Platform Vendor : Advanced Micro Devices, Inc.

Device 0 : Tahiti Device ID is 005EF838

Build Options are : -D DATATYPE=float4 -D OFFSET=16384

Global Memory Read

AccessType      : single

VectorElements  : 4

Bandwidth       : 1672.16 GB/s

Global Memory Read

AccessType      : linear

VectorElements  : 4

Bandwidth       : 1756.49 GB/s

Global Memory Read

AccessType      : linear(uncached)

VectorElements  : 4

Bandwidth       : 220.769 GB/s

Global Memory Write

AccessType      : linear

VectorElements  : 4

Bandwidth       : 501.668 GB/s

The question is: how can the linear read bandwidth be over 1 terabyte per second is the max theoretical bandwidth is about 260 GB/s?

The question that naturally follows is: are global buffer reads cached?

Thank you very much!

0 Kudos
Reply
1 Solution

Accepted Solutions
dmeiser
Elite

Re: Strange global memory bandwidth for HD 7970

Jump to solution

No, that's a cache hit.

A typical cache contention situation would be the following:

-wavefront 1 reads data from global and brings it into cache

-subsequently wavefront 2 reads data from global, brings it into cache and evicts wavefront 1's data from cache

-then wavefront 1 will have to read again from global memory (its data is no longer in cache).

If only one wavefront is running on a compute unit the second read by wavefront 1 would be cached.

If you anticipate that cache contentions like that are a performance limitation in your code you can limit the number of wavefronts executing on any CU by allocating so much shared memory that only one wavefront fits on a CU.

On the other hand, when lots of wavefronts access the same data you can get huge effective bandwidth gains by having many wavefronts execute on the same CU. Many of their reads will then be cached.

View solution in original post

0 Kudos
Reply
10 Replies
nou
Exemplar

Strange global memory bandwidth for HD 7970

Jump to solution

yes it is cached.

0 Kudos
Reply
cadorino
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

Is tahiti the first architecture that caches global memory in addition to texture memory?

0 Kudos
Reply
cadorino
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

Another question
How is the L1 cache partitioned among the wavefronts/work items scheduled on a CU?

0 Kudos
Reply
registerme
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

Per my understanding, you should use the L1 cache bandwidth instead of the global memory bandwidth. When it's cached, it's much faster.

0 Kudos
Reply
mikewolf_gkd
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

tahiti L1 cached read data and write data of global memory. before tahiti, L1 can't cache write data.

0 Kudos
Reply
mikewolf_gkd
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

one L1 per CU, so one wavefront(64 work item),  one L1 cache, for different workitem fetch data from global memory, I think it uses coalescene read/write, every time, bring data for 16 work item, these data can be cached in L1.

0 Kudos
Reply
cadorino
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

Well, but is the entire L1 shared between the working groups scheduled on a CU or it is partitioned among them?
Thank you!

0 Kudos
Reply
dmeiser
Elite

Re: Strange global memory bandwidth for HD 7970

Jump to solution

yes, there is cache contention between different wavefronts scheduled on a single cu.

0 Kudos
Reply
cadorino
Journeyman III

Re: Strange global memory bandwidth for HD 7970

Jump to solution

So if two items in two different wavefronts accesses the same buffer element two different pages are transferred into the L1 cache. Right? (or better, the same page is transferred twice)

0 Kudos
Reply