5 Replies Latest reply on Apr 29, 2010 3:23 AM by masm32

    ATI OpenCL on CPU (ATI Stream SDk 2.01)


      I'am unhappay at the moment. After digging all day long in forums and tutorials and documentations I think there is a bug in the ATI SDK.

      I'll try to calcualte some float4 color values and convert them to unsigned int to save the value in a __global unsigned int* buffer.

      This compiles ok, and is exactly the same like some raytracing function I've found, so I'll expect that this will work. (maybe I test it on my notebook with nvidia card). Every time I'll get an exception if I call enqueueNDRangeKernel. This exception happens on a movaps assembler instruction and hints that the memory is not aligned correctly (#aps means aligned). Unfortunately every thing I'll do to align memory fails (maybe it doesn't fail). The buffer is allocated by opencl functionality.


      The function looks like that

      __kernel void convert(__global uint* pDst,__global uint* pSrc1,__global uint* pSrc2)


      uint pos = get_global_id(0);

      float4 l_Val1 = uintToFloat4(p_Src[pos]);

      float4 l_Val2 = uintToFloat4(p_Src2[pos]);

      float4 l_Ret = float4ToUint(dosomething(l_Val1,l_Val2));

      // if I comment floatTiUint out and set l_Ret to some immediate value all is ok

      p_Dst[pos] = l_Ret;



      Hope someone knows what my fault is (I hope its my fault, because i'am new to opencl.)


      Best Regards,


        • ATI OpenCL on CPU (ATI Stream SDk 2.01)

          Does life get better if you change


          float4 I_Ret = float4ToUint(...)


          uint I_Ret = ...


          Ohh... and what does float4ToUnit really do?  You posted too small a code snippet to really tell us what is going on here.

          -- M. Reilly -- not an AMD/ATI employee.... just a fellow developer...



            • ATI OpenCL on CPU (ATI Stream SDk 2.01)

              Ohh, you are right, it's actually uint.


              I've found that this crash happens if I try something like that:

              uint float4ToUint(float4 v)


                  uint ret = (uint) ((v.x * 255.f));

                  return ret;


               I'll do it normally this way:

              return ((uint)(rgba.w*255.0f)<<24) | ((uint)(rgba.z*255.0f)<<16) | ((uint)(rgba.y*255.0f)<<8) | (uint)(rgba.x*255.0f);

              But even simple cast to uint seems to produce this crash, always at some sse instruction that expects  aligned memory.


              I've seen the same way of converting some floats to uint in a ratracer and they claimed this works (I have not tested the code). I would like to use images, but images are not supported.


              Best regards



                • ATI OpenCL on CPU (ATI Stream SDk 2.01)


                  I've tested it on my notebook with nvidia OpenCL implementation and the code works. I'll assume it's a bug in ATI OpenCL (CPU implementation).

                  The disassembly shows that the pointer is fetched from stack and is used in an sse or sse3 instruction with 16 bit memory alignment. I've to calculate the address aligment of this pointer, but I'am sure it is not aligned. This happens in the temporary dynamic link library that is compiled from the OpenCL implementation.

                  Little annoying is that the temp folder (on my Windows) is full with compiled dll's. It would be good if they will be deleted if the OpenCL implementation gets closed, so the temp folder is not filled with hundreds of dll's that will not be used anymore.

                  To be honestly, this bug, no image support in ATI OpenCL and the time that is gone since OpenCL is defined lead me to the decision to switch to CUDA instead. I believe that CUDA is far more stable, because its longer there and NVIDIA has more experiens with GPU computing language. If the problems are fixed in some future impementations of OpenCL I'll look at it again.


                  Thank you and best regards,


                    • ATI OpenCL on CPU (ATI Stream SDk 2.01)

                      that dlls are deleted when you properly release kernel and program.

                      • ATI OpenCL on CPU (ATI Stream SDk 2.01)

                        Last posting to this topic.

                        I've found the problem at least. Memory alignment has to be on 16 bit addresses, for all data types.

                        I've set the alignment on Host side with __delcspec(align(16)) for the datatypes and on Device side __attribute__ ((aligned(16))) allocated memroy on Device side with flag CL_MEM_COPY_HOST_PTR. But the access to the data was outside of the 16 bit alignment, so sse instruction crushes, even all data was correctly copied. To make sure that all data is aligned correctly I'll use #pragma pack(push,16). This works on Cpu. So all was my fault, due to wrong alignment.


                        Best regards,