4 Replies Latest reply on Mar 9, 2012 11:35 AM by laobrasuca

    Is there a limit on the number of CL buffers created/acquired from GL buffers?

    laobrasuca
      Screen freezes...

      About a while ago I posted here a problem I had when manipulating, say, a big number of CL buffers created/acquired from GL buffers. Back there, after upgrading to the SDK 2.4/CCC 11.4 combo (radeon hd5870), the code attached would make your screen freeze whenever the number buffer passes 128 and the kernel have at least one of these buffers as argument (I should mention that was not a problem before this combo). Now, after the SDK 2.5/CCC 11.7 combo I could finally create/acquire more than 128 buffers, but not much more. In fact, these days I observed that when creating/acquiring more than, say, 350 buffers (or something around this), and when using at least one of these buffers in a kernel, the screen will freeze (even if I don't really access data from these buffers, as you can see in the example code attached).

      So, all this to say that at least from the SDK 2.4/CCC 11.4 combo on, I cannot create/acquire more than a given number of buffers from GL buffers, otherwise I'll have screen frozen... My question is: why? Is this a problem in my code? Is this a bug in drivers? In the documentation, I could not find anything talking about a limit on the number of buffers I can create/acquire from GL (as long as I don't hit the limit of memory available  - which is not the case). As I said in other post, if the buffers are purely CL buffers, I don't have such limitation with the number of buffers I can create.

      I hope AMD driver/sdk team can clarify this and propose a solution if I'm wrongly using the sdk/drivers or release a bug fix if it is an internal driver/sdk problem. Could you test the example code attached with your internal sources and tell me it you can reproduce this same error?

      thank you in advance.

      #include <string> #include "GL/glew.h" #include "GL/glut.h" #include <CL/cl.h> #include <Cl/cl_gl.h> #include <windows.h> void initGLUT(int argc, char *argv[]); int InitializeComponents(void); int SetCLPlataform(void); int SetCLContext(void); int SetCLDevices(void); int SetCLCommandQueue(void); int SetCLProgram(void); int SetCLKernel(void); int generateGLBuffers(void); int CreateCLAndAcquireGLBuffers(void); int RunTestKernel(void); cl_context context; cl_context_properties * cprops; cl_platform_id platform; cl_device_id device_test; cl_command_queue commandQueue; cl_program program_test; cl_kernel kernel_test; int NbOfBuffers = 350; GLuint GL_IndicesBuffer, * GL_PositionBuffers; cl_mem CL_IndicesBuffer, * CL_PositionBuffers; #define KERNEL_HAVE_ARG_IN 1 int main(int argc, char * argv[]) { cl_uint status; initGLUT(argc, argv); if ((status = InitializeComponents()) != 0) return 1; if ((status = generateGLBuffers()) != 0) return 2; if ((status = CreateCLAndAcquireGLBuffers()) != 0) return 3; if ((status = RunTestKernel()) != 0) return 4; printf("All went fine!\n"); return 0; } void initGLUT(int argc, char *argv[]) { glutInit(&argc, argv); glutInitDisplayMode(GLUT_DOUBLE | GLUT_RGB | GLUT_DEPTH); glutInitWindowSize(800, 600); glutCreateWindow("Test"); glewInit(); } int InitializeComponents(void) { cl_int status = 0; if (SetCLPlataform() != 0) return 1; if (SetCLContext() != 0) return 2; if (SetCLDevices() != 0) return 3; if (SetCLCommandQueue() != 0) return 4; if ((status = SetCLProgram()) != 0) return 5; if ((status = SetCLKernel()) != 0) return 6; return 0; } int SetCLPlataform(void) { cl_int status; cl_uint numPlatforms; status = clGetPlatformIDs(0, NULL, &numPlatforms); if(status != CL_SUCCESS) return 1; cl_platform_id* platforms; if(numPlatforms > 0) { platforms = new cl_platform_id[numPlatforms]; status = clGetPlatformIDs(numPlatforms, platforms, NULL); if(status != CL_SUCCESS) return 2; for(unsigned int i=0; i < numPlatforms; ++i) { char pbuff[100]; status = clGetPlatformInfo( platforms[i], CL_PLATFORM_VENDOR, sizeof(pbuff), pbuff, NULL); if(status != CL_SUCCESS) return 3; platform = platforms[i]; if(!strcmp(pbuff, "Advanced Micro Devices, Inc.")) break; } } else return 4; return 0; } // Create openCL context from the openGL one int SetCLContext(void) { cl_int status; cl_context_properties cpsGL[] = { CL_CONTEXT_PLATFORM, (cl_context_properties) platform, CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(), CL_GL_CONTEXT_KHR, (intptr_t) wglGetCurrentContext(), 0}; cprops = (NULL == platform) ? NULL : cpsGL; if (cprops == NULL) return 1; // Create context for GPU type device context = clCreateContextFromType(cprops, CL_DEVICE_TYPE_GPU, NULL, NULL, &status); if(status != CL_SUCCESS) return 2; return 0; } // Identify and set opencl devices, if any int SetCLDevices(void) { cl_int status; size_t deviceListSize; ///////////////////////////////////////////////////////////////// // First, get the size of device list data ///////////////////////////////////////////////////////////////// status = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, NULL, &deviceListSize); if(status != CL_SUCCESS) return 1; if(deviceListSize == 0) return 2; ///////////////////////////////////////////////////////////////// // Now, get the device list data ///////////////////////////////////////////////////////////////// cl_device_id* devices = (cl_device_id *)malloc(deviceListSize); status = clGetContextInfo(context, CL_CONTEXT_DEVICES, deviceListSize, devices, NULL); if(status != CL_SUCCESS) return 3; ///////////////////////////////////////////////////////////////// // Identify GPU device, if any. ///////////////////////////////////////////////////////////////// device_test = devices[0]; free(devices); return 0; } // Create the command-queue data structure to coordinate execution of the kernels on the device int SetCLCommandQueue(void) { cl_int status; //Create command-queue and enable commands profiling commandQueue = clCreateCommandQueue(context, device_test, CL_QUEUE_PROFILING_ENABLE /*NULL*/, &status); if(status != CL_SUCCESS) return 1; return 0; } // Create and build program int SetCLProgram(void) { cl_int status; ///////////////////////////////////////////////////////////////// // Load the cl kernel string ///////////////////////////////////////////////////////////////// #if KERNEL_HAVE_ARG_IN std::string test_kernel = "\n\ __kernel void test( __global uint* idx) \n\ { \n\ return; \n\ }\0"; #else std::string test_kernel = "\n\ __kernel void test() \n\ { \n\ return; \n\ }\0"; #endif ///////////////////////////////////////////////////////////////// // Build ///////////////////////////////////////////////////////////////// const char * source = test_kernel.c_str(); program_test = clCreateProgramWithSource(context, 1, &source, NULL, &status); if(status != CL_SUCCESS) return 1; status = clBuildProgram(program_test, 1, &device_test, NULL, NULL, NULL); if(status != CL_SUCCESS) return 2; return 0; } // Create kernel int SetCLKernel(void) { cl_int status; kernel_test = clCreateKernel(program_test, "test", &status); if(status != CL_SUCCESS) return 1; return 0; } int generateGLBuffers(void) { int status; // Triangle index buffer glGenBuffers(1, &GL_IndicesBuffer); status = glGetError(); if (status != 0) return 1; glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, GL_IndicesBuffer); status = glGetError(); if (status != 0) return 2; glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(GLuint) * 3, NULL, GL_DYNAMIC_DRAW); status = glGetError(); if (status != 0) return 3; GL_PositionBuffers = new GLuint[NbOfBuffers - 1]; for (int k = 0; k < (NbOfBuffers - 1); k++) { // Vertex position buffers glGenBuffers(1, &GL_PositionBuffers[k]); status = glGetError(); if (status != 0) return 4; glBindBuffer(GL_ARRAY_BUFFER, GL_PositionBuffers[k]); status = glGetError(); if (status != 0) return 5; glBufferData(GL_ARRAY_BUFFER, sizeof(float) * 3 * 3, NULL, GL_DYNAMIC_DRAW); status = glGetError(); if (status != 0) return 6; } glFinish(); status = glGetError(); if (status != 0) return 7; return 0; } int CreateCLAndAcquireGLBuffers(void) { cl_int status; CL_PositionBuffers = new cl_mem[NbOfBuffers - 1]; memset(CL_PositionBuffers, 0, sizeof(cl_mem) * (NbOfBuffers - 1)); for (int k = 0; k < (NbOfBuffers - 1); k++) { CL_PositionBuffers[k] = clCreateFromGLBuffer(context, CL_MEM_READ_WRITE, GL_PositionBuffers[k], &status); if(status != CL_SUCCESS) return 1; } status = clEnqueueAcquireGLObjects(commandQueue, NbOfBuffers - 1, CL_PositionBuffers, 0, NULL, NULL); if(status != CL_SUCCESS) return 2; CL_IndicesBuffer = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, GL_IndicesBuffer, &status); if(status != CL_SUCCESS) return 3; status = clEnqueueAcquireGLObjects(commandQueue, 1, &CL_IndicesBuffer, 0, 0, NULL); if(status != CL_SUCCESS) return 4; status = clFinish(commandQueue); if(status != CL_SUCCESS) return 5; return 0; } int RunTestKernel(void) { cl_int status; cl_event events; cl_ulong startTime, endTime, Test_kernel_time; #if KERNEL_HAVE_ARG_IN status = clSetKernelArg(kernel_test, 0, sizeof(cl_mem), (void *)&CL_IndicesBuffer); if (status != CL_SUCCESS) return 1; #endif size_t g = 1, l = 1; status = clEnqueueNDRangeKernel(commandQueue, kernel_test, 1, NULL, &g, &l, 0, NULL, &events); if(status != CL_SUCCESS) return 2; if ((status = clWaitForEvents(1, &events)) != 0) return 1; clGetEventProfilingInfo(events, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, NULL); clGetEventProfilingInfo(events, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, NULL); Test_kernel_time = endTime - startTime; if ((status = clReleaseEvent(events)) != 0) return 1; return 0; }

        • Is there a limit on the number of CL buffers created/acquired from GL buffers?
          genaganna

           

          Originally posted by: laobrasuca About a while ago I posted here a problem I had when manipulating, say, a big number of CL buffers created/acquired from GL buffers. Back there, after upgrading to the SDK 2.4/CCC 11.4 combo (radeon hd5870), the code attached would make your screen freeze whenever the number buffer passes 128 and the kernel have at least one of these buffers as argument (I should mention that was not a problem before this combo). Now, after the SDK 2.5/CCC 11.7 combo I could finally create/acquire more than 128 buffers, but not much more. In fact, these days I observed that when creating/acquiring more than, say, 350 buffers (or something around this), and when using at least one of these buffers in a kernel, the screen will freeze (even if I don't really access data from these buffers, as you can see in the example code attached).

           

          So, all this to say that at least from the SDK 2.4/CCC 11.4 combo on, I cannot create/acquire more than a given number of buffers from GL buffers, otherwise I'll have screen frozen... My question is: why? Is this a problem in my code? Is this a bug in drivers? In the documentation, I could not find anything talking about a limit on the number of buffers I can create/acquire from GL (as long as I don't hit the limit of memory available  - which is not the case). As I said in other post, if the buffers are purely CL buffers, I don't have such limitation with the number of buffers I can create.

           

          I hope AMD driver/sdk team can clarify this and propose a solution if I'm wrongly using the sdk/drivers or release a bug fix if it is an internal driver/sdk problem. Could you test the example code attached with your internal sources and tell me it you can reproduce this same error?

           

          thank you in advance.

           

          Laobrasuca,

          Thank you for reporting issue. I am able to reproduce the issue and reported to developers.  I will get back to you once i get a reply.

          • Re: Is there a limit on the number of CL buffers created/acquired from GL buffers?
            laobrasuca

            a little update here, I've tested the same code on a Radeon HD 6970 and all goes fine. Maybe drivers for VLIW4 make things a little bit different from VLIW5, duno.

            • Re: Is there a limit on the number of CL buffers created/acquired from GL buffers?
              laobrasuca

              yet another update, the test code works just fine for the HD5870 with Linux (Debian Wheezy - test version - and CCC 11.12).