cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

settle
Challenger

Number of Global Memory Banks?

In the appendices of the AMD APP Programming Guide there is a list of hardware specs for various graphics cards.  Included in the list are the number of channels and the number of LDS banks; however there is not information about the number of global banks.  Where/how can I find this information?

0 Likes
7 Replies

Although in programming guide it says that cypress has 16 banks in global memory, no corrosponding numbers are available for other devices.

1. I am also interested in knowing which conflict is more severe Bank conflict or Channel Conflict?

2. Can there be a situation when bank conflict occurs, but channel conflict does not?

3. Also It would be very helpful, if someone can tell how much memory(in coalesced fashion) should be accessed by a wavefront to get maximum throughput?

0 Likes

While we're waiting for someone at AMD address our questions, maybe this will help answer your first question.  In the AMD APP SDK 2.6 you can run some tests on linear reads (uncached) by changing the offset to be increasing powers of two: AMD APP > samples > opencl > benchmarks > GlobalMemoryOptimizations.  The first time the bandwidth decreases significantly is probably the channel conflict.  If it then levels off for before significantly decreasing a second time, I would say that second drop is the bank conflict.  I tried this method but my results were not as conclusive as I anticipated due to the latency hiding nature of the GPU scheduler.

I can take a stab at your second question.  A bank/channel conflict serialized those memory accesses.  If the bank and channels have the same frequencies then serializing at the top (bank conflict) causes the next lower step (channel) to be serialized, but the converse of that is not necessarily true.

0 Likes
kcarney
Staff

Could you provide a link to the AMD APP programming guide that you are referring to?

I'm hoping I can find the person at AMD who worked on that guide, and that person can answer your questions.

Thanks!

Kristen

0 Likes

Hi Kristen,

Here's a link to the guide I mentioned: AMD Accelerated Parallel Processing OpenCL™ Programming Guide (v1.3f)

Thanks!

0 Likes

Thanks Settle, for your response.

I am not convinced that GlobalMemoryBandwidth can answer my questions. The graph approach you mentioned looks good.

Please share any results(if you have). I will also try to study the global memory behaviour on my end.

0 Likes

Here's a chart and the raw data from the tests I ran on my AMD Radeon HD 6750M (6 compute units, 1 GB GDDR RAM, 4 channels, 128 bit memory bus width).

Offset (Bytes)Bandwidth (GB/s)
1648.2738
3249.0174
6448.6044
12848.3058
25648.4109
51248.0643
102448.4455
204848.4947
409648.5194
819247.4413
1638440.0958
3276837.8450
6553638.8238
13107237.7734
26214435.5792
52428835.6806
104857635.9321
209715236.2016

It seems that a first drop starts at 8 KB and continues until 32 KB, and then a second drop starts and finishes at 256 KB.  Now I've got to find the link between these numbers and the hardware.

0 Likes

Here's what I heard back:

"Yes, that’s correct for NI and SI.

The Global Data Share (GDS) is 64 kbyte in size and is constructed from 32 - 512deep x 32 bits wide register files with one write and one read port via a pseudo two port memories.  The memory block be will constructed to have an interleaved bank address such that the lower 5 bits of the dword address will be the bank select and the upper 9 bits will be the address within the bank.  The memory will be constructed such that all 32 banks can be read, written or both in one clock"

Cheers!

Kristen

0 Likes