4 Replies Latest reply on Apr 5, 2012 1:55 AM by cantallo

    Radeon 4850 resources

    tanq
      Gave questions about 4850 memories

      I've just built my first OpenCL program. Now I want to make working something useful. However I want to know practical potential usability of my card. Some questions remains after reading documentation.

      1. Memory sizes. I red that each computing unit contains 16 cores and each core has 1024 32bit registers . Is it correct?

      2. On my card (R4850) only 1/4 of global memory available. Why? CLinfo reports 128Mb max allocation size, 512 Mb actually installed.

      3. What about memory for program code? Program code memory requirements not mentioned in documents, but it is obvious that compiled kernel needs some memory. What limits for code size?

      4. When kernel executes too long my Windows XP x64 shuts down the videodriver. I continue listening my winamp with both monitors turned off Sending computer into sleep and back helps, no reboot required. 
      Will it help if I split total work into several clEnqueueNDRangeKernel() calls?

       

        • Radeon 4850 resources
          nou

          1. radeon 5870 have 20 SIMD unit, each contain 16 5D unit. each 5D unit have 4x256x4x32 bit registers. so whole chip have 5,2MB of registers.

          2. yes you can allocate 128MB buffer. and one device have half of memory avaliable for buffers.

          3. code size neglible, few kB. there shoul be no practicle limit for code size

          4. yes splitting will help. you can't cross aproximately 5 second watchdog timer. there is a way to disable this watchdog. but it not recomended.

          • Radeon 4850 resources
            tanq

            > 2. yes you can allocate 128MB buffer. and one device have half of memory avaliable for buffers.

            Don't understand. What do you mean for "one device"? I'm using Radeon4850 - it's single GPU, not x2. And only 128Mb of 512 memory available for buffers. It's normal if some memory reserved for system needs, but where 3/4 of memory is gone? May be it is bug in ATIStream drivers?

             

             

              • Radeon 4850 resources
                himanshu.gautam

                tanq,

                Right now whole memory of a device is not available for openCL computations.

                Also some considerable amount of memory is required for other applications.

                  • Re: Radeon 4850 resources
                    cantallo

                    Yes, one important fact is that global memory is somehow banked (both on AMD and on NVidia GPU's) if you need to allocate for say 90% of the global memory, you have to allocate 4 buffers and switch between buffers within you kernel.

                     

                    I needed that for radar image synthesis, and the solution I used is to allocate 1 to 4 buffers depending on the size (from below 25% to above 75% of global memory) and generate kernel code similar to this:

                     

                    __kernel __attribute__((work_group_size_hint(64,4,1))) void preprecomp2plus(__global float *restrict buf,__global float *restrict buf2,__global float *restrict buf3,__global float *restrict buf4,const int bufoffset,...)

                    {

                    __private int i;

                    __global float *pbuf;

                     

                    i=(bufoffset+get_global_id(1))&16383;  //it is a circular buffer of 16383 lines of 23424 float's too big to fit into the 768Mb single allocation, but fits into the 3Gb global memory in 4 sub-buffers

                    pbuf=buf;

                    if (i>=4096)

                    {

                    i-=4096;

                    pbuf=buf2;

                    }

                    if (i>=4096)

                    {

                    i-=4096;

                    pbuf=buf3;

                    }

                    if (i>=4096)

                    {

                    i-=4096;

                    pbuf=buf4;

                    }

                    i=i*23424+rangeoffset+get_local_id(0);

                    ... (work with pbuf[i] ...

                     

                     

                    If you look to the OpenCL specifications, you 'll see that the minimum "max allocation size" should be 25% of the total global memory...