3 Replies Latest reply on Nov 30, 2013 1:15 PM by nou

    Problem using zero copy buffer OpenCL in discrete GPU

    arvin99

      Hello everybody,

       

      I have a question about OpenCL especially about zero copy buffer. I create an OpenCL program  for my assignment and I get difficulty about how to use zero copy buffer.

      I use prepinned buffer (CL_MEM_ALLOC_HOST_PTR) and  map  the buffer (clEnqueueMapBuffer) from host to device and from device to host in order to get zero copy result.

      I use AMD APP Profiler to get the result but the result is weird.

      Transfer from host to device using zero copy buffer produce weird result like using pinned buffer ( higher transfer size ) but the transfer from device to host produce good result (zero copy buffer result -->NA).

       

      Here is the result:

      Untitled.png

       

       

      I use AMD APP SDK 2.6 and OS Windows 7 (64 bit)

      I use discrete GPU ( AMD Radeon HD6630M ) as my OpenCL device. Is it because I use discrete GPU and not integrated GPU ??

      My discrete GPU suppot Virtual Memory (VM)

       

      Here is the code:

      // Allocate Device Memory For Input And Output

        d_A = clCreateBuffer(context,  CL_MEM_READ_ONLY  | CL_MEM_ALLOC_HOST_PTR,   sizeof(cl_float)*size*size, 0, &err);

        d_B = clCreateBuffer(context,  CL_MEM_READ_ONLY | CL_MEM_ALLOC_HOST_PTR,   sizeof(cl_float)*size*size, 0, &err);

        d_C = clCreateBuffer(context,  CL_MEM_WRITE_ONLY | CL_MEM_ALLOC_HOST_PTR  , sizeof(cl_float)*size*size, 0, &err);

       

        void* mapPtrA = (float*)clEnqueueMapBuffer( queue, d_A, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

        void* mapPtrB = (float*)clEnqueueMapBuffer( queue, d_B, CL_TRUE, CL_MAP_WRITE, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

       

        memcpy(mapPtrA, matrixA, sizeof(cl_float)*size*size);

        memcpy(mapPtrB, matrixB, sizeof(cl_float)*size*size);

       

        clEnqueueUnmapMemObject(queue, d_A, mapPtrA, 0, NULL, NULL);

        clEnqueueUnmapMemObject(queue, d_B, mapPtrB, 0, NULL, NULL);

       

      //this function call kernel

        MatrixMul(d_A, d_B, d_C, size);

       

        void* mapPtrC = (float*)clEnqueueMapBuffer( queue, d_C, CL_TRUE, CL_MAP_READ, 0, sizeof(cl_float)*size*size, 0, NULL, NULL, NULL);

        memcpy(matrixC, mapPtrC, sizeof(cl_float)*size*size);

        clEnqueueUnmapMemObject(queue, d_C, mapPtrC, 0, NULL, NULL);

       

        err = clReleaseMemObject(d_A);

        err = clReleaseMemObject(d_B);

        err = clReleaseMemObject(d_C);