6 Replies Latest reply on Apr 5, 2011 5:18 PM by trinitrotoluene

    Systems hangs while any openCL activity is running (not just the display)

    bjano

      Hello,

      I have a 32-bit ubuntu 10.10 system using the 2.3 sdk,  with a h5670 and an intel CPU.

      My problem is that when I start any opencl stuff, the system halts completely until the gpu is finished. I have read the recent similar thread, this situation is different in that not only the display stops working, but absolutely everything. The most obvious is that the music stops playing in the background which should have nothing to do with the video card. Also, I can't stop a running program, as the mouse and keyboard are not responding. When the program is finished, everything is back to normal.

      I am not using any OpenGL in my program,  and I tried a few sample codes (NBody for example) and they don't hang the system.

      What could be causing this?

      I was wondering that perhaps the opencl library decides to run the kernels on the cpu and this causes it to be not able to deal with anything else, but I have already checked a million times that the gpu's device ID is selected.

        • Systems hangs while any openCL activity is running (not just the display)
          tweenk

          The keyboard and mouse input might not be working because the X server is waiting for some graphic operation to finish before it processes the input events. Same goes for the music player.

          • Systems hangs while any openCL activity is running (not just the display)
            genaganna

             

            Originally posted by: bjano Hello,

             

            I have a 32-bit ubuntu 10.10 system using the 2.3 sdk,  with a h5670 and an intel CPU.

             

            My problem is that when I start any opencl stuff, the system halts completely until the gpu is finished. I have read the recent similar thread, this situation is different in that not only the display stops working, but absolutely everything. The most obvious is that the music stops playing in the background which should have nothing to do with the video card. Also, I can't stop a running program, as the mouse and keyboard are not responding. When the program is finished, everything is back to normal.

             

            I am not using any OpenGL in my program,  and I tried a few sample codes (NBody for example) and they don't hang the system.

             

            What could be causing this?

             

            I was wondering that perhaps the opencl library decides to run the kernels on the cpu and this causes it to be not able to deal with anything else, but I have already checked a million times that the gpu's device ID is selected.

             

            Bjano,

                     You said SDK samples are running without any hang. It would be good if you paste your code here?

              • Systems hangs while any openCL activity is running (not just the display)
                bjano

                 

                  You said SDK samples are running without any hang. It would be good if you paste your code here?


                It happens to any code that I compile. I haven't tried to build the samples though, I only ran the binaries I found in the SDK's directory.

                Here is a test, this runs (and hangs) for 4 seconds on my computer. I made it by modifying one of the examples I found when learning the basics, it  simply multiplies 2 huge matrices. I have only copied the code that I believe is the relevant  part of it.

                edit: or here is the code in its entirety: http://www2.bjano.hu/OpenCL_test2.zip

                 

                // Create kernel object kernel = clCreateKernel(program, "mmul", &err); if (err != CL_SUCCESS || kernel == 0) showError(err,"create kernel object"); // Allocate memory on device cl_mem dev_x = clCreateBuffer(clenv.context, CL_MEM_READ_ONLY, sizeof(float) * MATSIZE * MATSIZE, NULL, NULL); cl_mem dev_y = clCreateBuffer(clenv.context, CL_MEM_READ_ONLY, sizeof(float) * MATSIZE * MATSIZE, NULL, NULL); cl_mem dev_mres = clCreateBuffer(clenv.context, CL_MEM_WRITE_ONLY, sizeof(float) * MATSIZE * MATSIZE, NULL, NULL); if (dev_x == 0 || dev_y == 0 || dev_mres == 0) showError(0,"allocate device memory"); // Create random data on host... float * x = malloc(MATSIZE * MATSIZE * sizeof(float)); float * y = malloc(MATSIZE * MATSIZE * sizeof(float)); for (int i = 0; i < MATSIZE * MATSIZE; ++i) { x[i] = (float)rand() / RAND_MAX; y[i] = (float)rand() / RAND_MAX; } // ... and write it to memory objects err = clEnqueueWriteBuffer(clenv.commands, dev_x, CL_TRUE, 0, sizeof(float) * MATSIZE * MATSIZE, x, 0, NULL, NULL); err |= clEnqueueWriteBuffer(clenv.commands, dev_y, CL_TRUE, 0, sizeof(float) * MATSIZE * MATSIZE, y, 0, NULL, NULL); if (err != CL_SUCCESS) showError(0,"write data to device memory"); endTimer("write buffer"); startTimer(); // Set kernel arguments err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &dev_x); err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &dev_y); err |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &dev_mres); if (err != CL_SUCCESS) showError(err,"set kernel arguments"); // Enqueue kernel execution const size_t work_items[2] = {MATSIZE,MATSIZE}; const size_t local_work_items[2] = {1,64}; err = clEnqueueNDRangeKernel(clenv.commands, kernel, 2, NULL, work_items, local_work_items, 0, NULL, NULL); if (err) showError(0,"execute kernel"); // Wait for all commands in queue to finish clFinish(clenv.commands); endTimer("kernel execute"); startTimer(); // Read results from device memory to host memory float * result = malloc(sizeof(float) * MATSIZE * MATSIZE); err = clEnqueueReadBuffer(clenv.commands, dev_mres, CL_TRUE, 0, sizeof(float) * MATSIZE * MATSIZE, result, 0, NULL, NULL); if (err != CL_SUCCESS) showError(err,"read result from device"); // kernel code: #define MATSIZE 2048 __kernel void mmul(__global float* ma, __global float* mb, __global float* mres) { const int row = get_global_id(0); const int col = get_global_id(1); __local float alocal[MATSIZE]; async_work_group_copy(alocal,ma+row*MATSIZE,MATSIZE,0); __local float* a = alocal; // __global float* a = ma + row * MATSIZE; __global float* b = mb + col; __private float acc = 0.0f; for (int i=0;i<MATSIZE;i++) { acc += *a * *b; a += 1; b += MATSIZE; } mres[row*MATSIZE+col] = acc; };

                  • Systems hangs while any openCL activity is running (not just the display)
                    himanshu.gautam

                    bjano,

                    Ubuntu 10.10 is not a supported Operating System.

                    Still can you check to build and run any SDK sample( matrixmultiplication if you like) and see if it also hangs your system. You can try some supported OS also and see if hang occurs there.

                      • Systems hangs while any openCL activity is running (not just the display)
                        trinitrotoluene

                        I think the GUI hang is normal. Your program hang my gui for 2 seconds but everything else keep working. I have runned both your test program and the AMD OpenCL sample MatrixMulDouble -x 2048 -y 2048 -z 2048.

                        Here some brief result from sprofile for your test program:

                         

                         

                        mmul__k1_Cypress1 time = 2717 ms   ALUBusy: 5.93%

                        Here the brief result from MatrixMultDouble

                         

                         

                        mmmKernel_local__k1_Cypress1 time = 52 ms  ALUBusy:45.48%

                        I think that your test program suffer to much from cache miss and the ALU are not very busy. I have made a simple test program like you on the CPU with OpenMP with two algorithm. The "naive" way and the cache optimized way described in the book Computer Architecture A Quantitive Approach

                        Here the result on a PhenomIIx6 with two 2048x2048 matrices multiplied

                        Naive: 56s

                        Cache optimized: 6,28s => ~10x faster. Imagine on the GPU.

                        I will try that on the GPU with OpenCL.