3 Replies Latest reply on Nov 23, 2011 9:34 AM by himanshu.gautam

    2D buffers

    Anon5710

      Hi, i'm trying to learn myself opencl.

      I was able te write a vector add opencl program and i would like to do the same thing with a matrix (point by point adding).

      I've looked in the spec's for clcreateImage2D however it seems that for 2 int matrixes this is not the way to do it.

      Am i right to think that i should still should  use clcreatebuffer and transform my 2 matrixes into 2 longer arrays ?

      also some examples regarding ND range workgroup range etc.. would be appriciated i find these concepts to be rather hard.

        • 2D buffers
          Meteorhead

          General rules of writing GPU code:

          1. Create serial version to verify correctness of paralell algorithm. (Might skip if extremely simple)

          2. Make it work. (able to compile and run)

          3. Make it right. (produce actually correct results, not garbage)

          4. Make it fast. (optimize only at this point)

          Golden rule: premature optimization is the root of all evil!

          If you find buffers to be simpler than images, use them. When you are bored of buffers, start experimenting with images. If too many concepts are new, try to simplify things and resort to ones that you understand and know that are working. If something is buggy, and you used 3-4 new elements at once, you'll have no idea where you've gone wrong.

          The idea of NDRange will not get simpler, but it is really no black magic. You should think of it as the specification of how many threads you want to launch. Global worksize is how many threads you want to launch, and local worksize is how large should the threads group into, that  have a type of memory that they can share. (Naturally this cannot be arbitrarily large) For simple algorithms, where threads need not communicate with each other, you can safely disregard local worksize and when calling clEnqueueNDRangeKernel, set the corresponding argument to NULL, and let the implementation decide what to do with thread grouping.

          Hope that helped.

          Do read the OpenCL Programming Guide of AMD, as very simple questions such as this usually do not get answered, but jus get pointed to some tutorial or guide. (It is not lazyness, and people wont say "RTFM", but we've all had to learn OpenCL through tutorials and guides ourselves, so if many people did it before, it cannot be that hard)