cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

dstokac
Journeyman III

Information from CL_DEVICE_* doesn't correspond to my GPGPU?

Using device.getInfo<A>() one can query device info. For HD4770 on a linux machine I get the following information:

1) A=CL_DEVICE_NAME : ATI RV770

Shouldn't it be RV740?

2) A=CL_DEVICE_GLOBAL_MEM_SIZE: 128Mb

Shouldn't it be 512Mb?

3) A=CL_DEVICE_LOCAL_MEM_TYPE   :  2,

where 2==CL_GLOBAL.

Shouldn't it be CL_LOCAL==1?

Doesn't __local corespond to the shared memory of each SIMD, which has 16Kb size?

 

4) A=CL_DEVICE_GLOBAL_MEM_CACHE_TYPE | CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE | CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 0

Should those values vanish for my device?

0 Likes
6 Replies

1) The RV740 and RV770 are essentially equivalent as far as OpenCL is concerned, I think CAL also reports the 740 as a 770 and OpenCL reports what CAL reports.
2) The global memory size is not the equivalent of how much memory is on your board but how much memory is currently allocated on the board for OpenCL kernel execution
3) on the 7XX series of cards, __local is emulated in global memory since the 7XX does not have fully read/write local memory like 8XX does
4) Not sure what you mean here.
0 Likes

Micah, thanks for the quick reply!

It raised though a few other questions:

1) Is the emulation of __local by global memory in 7xx architecture permanent, or it is just an artefact of the beta version of your OpenCL implementation?

2) If this emulation is permanent, does it mean that OpenCL on 7xx architecture is seriously crippled?

3) If I need more that 128Mb, and I do, does it mean that the space will be dynamically allocated, or I need to adjust some environment variable?

4) In point 4) of the previous post, I meant that for each of the mentioned device properties one gets zero, i.e. global memmory is not cached. Are there devices where global memory gets cached?

0 Likes

Originally posted by: MicahVillmow  3) on the 7XX series of cards, __local is emulated in global memory since the 7XX does not have fully read/write local memory like 8XX does


Would it be possible for the compiler to detect the special case where the kernel only performs writes to get_local_id(0), i.e.

__kernel void mykenel(__local float4* shmem, ...) {

shmem[get_local_id(0)] = val

}

(which is supported by RV770) and use shared memory, rather than global memory in that case?  For predictable performance, it would be nice if the user could be warned of such hardware support violations resulting in reduced performance.  Perhaps in Stream Kernel Analyzer, for example.

 

 

 

 

0 Likes

@ Micah:

I'm waiting for a 2gb 5870 to come out ... in the meantime, are the 5750 and 5770 equivalent to a 5870 in terms of hardware support for OpenCL? (hardware shmem support, etc.)

 

 

 

0 Likes

One difference which I know - 5700 series doesn't have DP support.

Shared memory is 32KB for both 58xx and 57xx series cards.

 

0 Likes

dstokac,
1) There is currently no plan to move __local from Global to LDS memory as the hardware restrictions are quite severe.
3) The runtime will reallocate for you, there is nothing special that you should need to do.
4) The values that are reported are the correct values for our current GPU implementation.
0 Likes