Archives Discussions

mordock2012 · ‎05-01-2013

Hello,

I am looking for the lowest level hardware details that I can get for some GPGPU research that I am doing.

I have seen the specs for the hardware and I'd like to know more than they tell me.

For instance:

How much register file is available per compute unit? I've read this to be 256KB for the desktop version of the 5870. Is it the same for the laptop version.

How many compute units (SIMD engines) in the Mobility version of the GPU.

How much OpenCL local memory per compute unit on the laptop version of the card.

What are the proper names for a "core". I have heard the terms Threading Processor, Multiprocessor, and Streaming core all applied to AMD technology. Which is it?

The 5 processing elements inside a "core". What is their official name. The research that I have read is inconsistent in the naming.

Lastly, your engines are touted as "SIMD" but branching within OpenCL code is possible. What occurs during a branch in OpenCL? If a branch occurs in the code, is SIMD no longer used?

Providing detail or the location of any of these details would be fantastic.

Thank you,

Coby Soss

himanshu_gautam · ‎05-02-2013

How much register file is available per compute unit? I've read this to be 256KB for the desktop version of the 5870. Is it the same for the laptop version.
How many compute units (SIMD engines) in the Mobility version of the GPU.
How much OpenCL local memory per compute unit on the laptop version of the card.

You can clinfo on your laptop to know all this ,and even more information. AFAIK, HD 5870M is equivalent to HD5770, so you can check that spec in AMD OpenCL Programming guide too.

What are the proper names for a "core". I have heard the terms Threading Processor, Multiprocessor, and Streaming core all applied to AMD technology. Which is it?
The 5 processing elements inside a "core". What is their official name. The research that I have read is inconsistent in the naming.

AMD uses the OpenCL terminology. You should call GPU cores as Compute Units (CUs). As per OpenCL " 5 Processing elements" is the right term to be used, although equivalent AMD term is VLIW5.

Lastly, your engines are touted as "SIMD" but branching within OpenCL code is possible. What occurs during a branch in OpenCL? If a branch occurs in the code, is SIMD no longer used?
Providing detail or the location of any of these details would be fantastic.

branching is certainly possible, and it does not hurt performance also if there is no branch divergence. In case of divergence, SIMDs are not used efficiently, as some of them are masked (where branch condition was false).

View solution in original post

himanshu_gautam · ‎05-02-2013

How much register file is available per compute unit? I've read this to be 256KB for the desktop version of the 5870. Is it the same for the laptop version.
How many compute units (SIMD engines) in the Mobility version of the GPU.
How much OpenCL local memory per compute unit on the laptop version of the card.

You can clinfo on your laptop to know all this ,and even more information. AFAIK, HD 5870M is equivalent to HD5770, so you can check that spec in AMD OpenCL Programming guide too.

What are the proper names for a "core". I have heard the terms Threading Processor, Multiprocessor, and Streaming core all applied to AMD technology. Which is it?
The 5 processing elements inside a "core". What is their official name. The research that I have read is inconsistent in the naming.

AMD uses the OpenCL terminology. You should call GPU cores as Compute Units (CUs). As per OpenCL " 5 Processing elements" is the right term to be used, although equivalent AMD term is VLIW5.

Lastly, your engines are touted as "SIMD" but branching within OpenCL code is possible. What occurs during a branch in OpenCL? If a branch occurs in the code, is SIMD no longer used?
Providing detail or the location of any of these details would be fantastic.

branching is certainly possible, and it does not hurt performance also if there is no branch divergence. In case of divergence, SIMDs are not used efficiently, as some of them are masked (where branch condition was false).

mordock2012 · ‎05-06-2013

Hello Himanshu,

Thank you for the very helpful response.

I was wondering if you could elaborate further on what you mean by "masking" of SIMDs.

Coby.

himanshu_gautam · ‎05-07-2013

Masking of SIMDs is in relevance to executing conditionals on GPUs. Consider a wavefront having 64 work-items, and assume out of these, 63 work-items execute the "if" branch of conditional, and 1 execute the "else" branch of conditional. So here, the wavefront would first mask the else thread, and the complete wavefront will execute the "if" block. Results computed for masked thread will not be updated. After that the else block will execute, masking 63 threads and enabling just one. Again the complete wavefront will execute the "else" block now.

This is how conditionals affect the performance of the applications, and they take almost double the time to execute, if a wavefront diverge at a conditional.

mordock2012 · ‎05-08-2013

Hello Himanshu,

Thanks. I believe that I understand the workings of the graphics card much better now.

There is one more question I have with regard to OpenCL terminology.

The OpenCL specification discussed the concepts of SIMD and SPMD (Single Program Multiple Data).

I was initially under the impression that it was possible for each thread in a wavefront to have their own instruction pointer, simply because divergence is possible. It is correct for me to conclude that all the threads always have the same instruction pointer and the divergence is handled by masking?

If this is the case, does the card always operate in SIMD mode (every thread is processing the same instruction at any point in time)?

Coby.

himanshu_gautam · ‎05-08-2013

As i understand every Compute Unit has its Program counter. So technically every stream core in a Compute unit execute same instruction at any time.

Archives Discussions

AMD Radion Mobility 5870 low level details