6 Replies Latest reply on Nov 2, 2009 3:14 PM by MicahVillmow

    Information from CL_DEVICE_* doesn't correspond to my GPGPU?

    dstokac

      Using device.getInfo<A>() one can query device info. For HD4770 on a linux machine I get the following information:

      1) A=CL_DEVICE_NAME : ATI RV770

      Shouldn't it be RV740?

      2) A=CL_DEVICE_GLOBAL_MEM_SIZE: 128Mb

      Shouldn't it be 512Mb?

      3) A=CL_DEVICE_LOCAL_MEM_TYPE   :  2,

      where 2==CL_GLOBAL.

      Shouldn't it be CL_LOCAL==1?

      Doesn't __local corespond to the shared memory of each SIMD, which has 16Kb size?

       

      4) A=CL_DEVICE_GLOBAL_MEM_CACHE_TYPE | CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE | CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 0

      Should those values vanish for my device?

        • Information from CL_DEVICE_* doesn't correspond to my GPGPU?
          MicahVillmow
          1) The RV740 and RV770 are essentially equivalent as far as OpenCL is concerned, I think CAL also reports the 740 as a 770 and OpenCL reports what CAL reports.
          2) The global memory size is not the equivalent of how much memory is on your board but how much memory is currently allocated on the board for OpenCL kernel execution
          3) on the 7XX series of cards, __local is emulated in global memory since the 7XX does not have fully read/write local memory like 8XX does
          4) Not sure what you mean here.
            • Information from CL_DEVICE_* doesn't correspond to my GPGPU?
              dstokac

              Micah, thanks for the quick reply!

              It raised though a few other questions:

              1) Is the emulation of __local by global memory in 7xx architecture permanent, or it is just an artefact of the beta version of your OpenCL implementation?

              2) If this emulation is permanent, does it mean that OpenCL on 7xx architecture is seriously crippled?

              3) If I need more that 128Mb, and I do, does it mean that the space will be dynamically allocated, or I need to adjust some environment variable?

              4) In point 4) of the previous post, I meant that for each of the mentioned device properties one gets zero, i.e. global memmory is not cached. Are there devices where global memory gets cached?

              • Information from CL_DEVICE_* doesn't correspond to my GPGPU?
                emuller

                 

                Originally posted by: MicahVillmow  3) on the 7XX series of cards, __local is emulated in global memory since the 7XX does not have fully read/write local memory like 8XX does


                Would it be possible for the compiler to detect the special case where the kernel only performs writes to get_local_id(0), i.e.

                __kernel void mykenel(__local float4* shmem, ...) {

                shmem[get_local_id(0)] = val

                }

                (which is supported by RV770) and use shared memory, rather than global memory in that case?  For predictable performance, it would be nice if the user could be warned of such hardware support violations resulting in reduced performance.  Perhaps in Stream Kernel Analyzer, for example.

                 

                 

                 

                 

              • Information from CL_DEVICE_* doesn't correspond to my GPGPU?
                MicahVillmow
                dstokac,
                1) There is currently no plan to move __local from Global to LDS memory as the hardware restrictions are quite severe.
                3) The runtime will reallocate for you, there is nothing special that you should need to do.
                4) The values that are reported are the correct values for our current GPU implementation.