cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

ravikeshri
Journeyman III

How to calculate number of processing element in a GPU?

Hi,

I want to know how do we calculate number of processing elements in a GPU by checking  the GPU's specifications and OpenCL APIs.

This link (http://developer.amd.com/documentation/articles/pages/opencl-and-the-ati-stream-v2.0-beta.aspx ) says that the ATI Radeopn HD 5870 GPU has 20 Compute units and 16 processnig elements per compute unit. So total number of processing elements are 320.

The ATI Radeon HD 5870 specification (http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-5870/Pages/ati-radeon-hd-5870-specifications.aspx) says that there are 1600 Stream Processing Units in the GPU.

Does this mean that every processing element requires 5 Stream Processing Units in this GPU?

Thanks,

Ravi

0 Likes
4 Replies
nou
Exemplar

there is 5D units arrange to 16 group SIMD core. on 5870 there is 20 such cores.

to fully utilize GPU you must execute 64 thread per SIMD core. as 64 thread are executed in 4 waves per 16 thread.

0 Likes

Hi Nou,

Thank you very much. Just to confirm my understandings, let us say a GPU has 5 compute units (or SIMD core), we should have 5 * 64 threads to have optimized performance. i.e. 320 threads. Also a thread is same as work-item so to utilize the GPU to the best, we should have 320 global work-items for this GPU.

Have I understood it correctly or this is not correct? If you can point me to a reference that talks about these things, I will really appreciate that.

Thanks again for your time on this.

Ravi

0 Likes

Originally posted by: ravikeshri Hi Nou,

 

Thank you very much. Just to confirm my understandings, let us say a GPU has 5 compute units (or SIMD core), we should have 5 * 64 threads to have optimized performance. i.e. 320 threads. Also a thread is same as work-item so to utilize the GPU to the best, we should have 320 global work-items for this GPU.

 

Have I understood it correctly or this is not correct? If you can point me to a reference that talks about these things, I will really appreciate that.



320 threads will occupy every SIMD unit, but it doesn't mean that's enough to achieve peak efficiency.  There are latencies and such in the shader core, so if you have a minimal number of groups, you won't have any way of absorbing any extra latency incurred (such as from memory accesses).

Ideally, you'd like to get four wavefronts per SIMD to help cover latency, that means you need four times as many threads as you stated for your ASIC with 5 SIMDs.

-Jeff

0 Likes

Hi Jeff,

Thank you very much. I understand it now. Thanks!

Ravi

0 Likes