6 Replies Latest reply on Jan 12, 2010 8:12 AM by omkaranathan

    Multiple devices in one context generate system GUI freeze in debug mode

    ibird

      To divide some work into pieces and execute them in more than one device  from one or more platforms , i has created some classes that find all platforms and devices inside them, then i create ONE context for ALL devices into every platform selected, then i compile the kernels for all devices.

       

      The classes call the kernel several consecutive times , one for each piece of work , and the results are good with no crashes or errors. But in debugging mode i get a lot of freeze system, and debugging the code in this condition is a pain

      So i has tried to find where is the problem, and i has extrapolated a simple code to reproduce it.

       

      compile the program with

      g++ -g -c main.cpp
      g++ -lOpenCL main.o -o main

      Open the binary with gdb or ddd (i use ddd), put a breakpoint into line 188 and run the program until it break,  from the breakpoint do single step until the end of the program, except for the mouse pointer that continue to move, it should freeze all the system GUI into one of the clReleaseMemory calls.

      This problem disappear if i create ONE context for EACH device ( code not included )

      The problem is into my implementation/code or is a bug ?

      I am on linux ubuntu 9.04

      I will thanks if someone can test the code, and give me a feedback or an answer to the question

      ( The code assume that, all OpenCL calls SUCCESS and #define ATI_PLATFORM point to the number of the ATI Platform in the list )

      The code do not do nothing, find the platforms an select (manualy) the ATI platform, create a context with all devices into the platform (CPU and GPU), compile a kernel, create a command queue for each device, create 7 buffers, and delete the buffers, and release all.

       

      #include <stdio.h> #include <stdlib.h> #include <string.h> #include <iostream> #include <CL/cl.h> #include <vector> #define KERNEL_TEST 0 #define ATI_PLATFORM 0 char kernelsourcedot[] = "/*\n\ * kernel which do a scalar product for every row of the two input matrix\n\ * and stores it at the corresponding output array\n\ */\n\ \n\ \n\ __kernel void testkernel(__global float * output, __global float * input, const int width)\n\ {\n\ \n\ globID = get_global_id(0);\n\ \n\ output[globID] = 0;\n\ \n\ }"; struct pdevices { bool use; cl_device_id devID; cl_command_queue commandQueue; cl_device_type type; cl_mem inputBuffer[6]; cl_mem outputBuffer[1]; pdevices() {} ~pdevices() {} }; struct platform { bool use; cl_platform_id platID; cl_context context; cl_kernel kernels[1]; cl_program program[1]; /* one program one kernel */ pdevices devices[2]; }; platform platforms[1]; int main(int argc, char ** argv) { int width[6] = {1,2,4,1,2,4}; cl_int status = 0; cl_uint nplat = 0; cl_uint num_devices; /* Get all platforms and devices info */ /* Get number of platforms */ status = clGetPlatformIDs(0,NULL,&nplat); if(status != CL_SUCCESS) return 0; /* Get all platforms ID */ cl_platform_id * platformst = new cl_platform_id [nplat]; status = clGetPlatformIDs(nplat,platformst,NULL); if(status != CL_SUCCESS) return 0; /* for each platform get platform and devices info and store them */ platforms[0].platID = platformst[ATI_PLATFORM]; platforms[0].use = true; /* Get num of devices into a platform */ status = clGetDeviceIDs(platformst[0],CL_DEVICE_TYPE_ALL,0,NULL,&num_devices); if (status != CL_SUCCESS) return 0; /* Get all devices ID in a platform */ cl_device_id * devicest = new cl_device_id [num_devices]; status = clGetDeviceIDs(platformst[0],CL_DEVICE_TYPE_ALL,num_devices,devicest,NULL); if (status != CL_SUCCESS) return 0; /* for each device get device info and store them */ for (int j = 0 ; j < (int)num_devices ; j++) { platforms[0].devices[j].devID = devicest[j]; platforms[0].devices[j].use = true; /* Get device name and vendor name , if an error occur do not use the device and continue */ status = clGetDeviceInfo(devicest[j],CL_DEVICE_TYPE,256,&platforms[0].devices[j].type,NULL); if (status != CL_SUCCESS) return 0; } delete [] devicest; delete [] platformst; /* for each platform create a context */ cl_context_properties prop[3]; prop[0] = CL_CONTEXT_PLATFORM; prop[1] = (cl_context_properties)platforms[0].platID; prop[2] = 0; ////////////// More than one device in context //////////////// cl_device_id * devid = new cl_device_id [num_devices]; for (int j = 0 ; j < (int)num_devices ; j++) { devid[j] = platforms[0].devices[j].devID; } platforms[0].context = clCreateContext(prop, num_devices, devid, NULL, NULL ,&status); if(status != CL_SUCCESS) return 0; delete [] devid; /* for each devices in a platform create a command Queue */ for (int j = 0 ; j < (int)num_devices ; j++) { platforms[0].devices[j].commandQueue = clCreateCommandQueue(platforms[0].context, platforms[0].devices[j].devID, 0, &status); if(status != CL_SUCCESS) return 0; } /* Sources loading */ size_t sourceSize = sizeof(kernelsourcedot)-1; const char * source = kernelsourcedot; platforms[0].program[KERNEL_TEST] = clCreateProgramWithSource(platforms[0].context, 1, &source, &sourceSize, &status); /* for each devices build the CL program */ /* Build KERNEL_SCALARPRODUCT */ status = clBuildProgram(platforms[0].program[KERNEL_TEST], 0, NULL, NULL, NULL, NULL); /* get a kernel object for a kernel with the given name, if an error occur Kernel is not used * we will check the kernel NULL value later when try to use it */ platforms[0].kernels[KERNEL_TEST] = clCreateKernel(platforms[0].program[KERNEL_TEST], "testkernel", &status); for (int j = 0 ; j < (int)num_devices ; j++) { /* the device must be active, of selected type */ if (platforms[0].devices[j].type != CL_DEVICE_TYPE_CPU) continue; platforms[0].devices[j].inputBuffer[0] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[0] * 3, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices[j].inputBuffer[1] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[1] * 4, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices[j].inputBuffer[2] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[2], NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices[j].inputBuffer[3] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[3] * 3, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices[j].inputBuffer[4] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[4] * 4, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices[j].inputBuffer[5] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[5], NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices[j].outputBuffer[0] = clCreateBuffer(platforms[0].context, CL_MEM_WRITE_ONLY, sizeof(cl_int) * width[3], NULL, &status); if(status != CL_SUCCESS) return 0; } for (int j = 0 ; j < (int)num_devices ; j++) { if (platforms[0].devices[j].type != CL_DEVICE_TYPE_CPU) continue; status = clReleaseMemObject(platforms[0].devices[j].inputBuffer[0]); status = clReleaseMemObject(platforms[0].devices[j].inputBuffer[1]); status = clReleaseMemObject(platforms[0].devices[j].inputBuffer[2]); status = clReleaseMemObject(platforms[0].devices[j].inputBuffer[3]); status = clReleaseMemObject(platforms[0].devices[j].inputBuffer[4]); status = clReleaseMemObject(platforms[0].devices[j].inputBuffer[5]); status = clReleaseMemObject(platforms[0].devices[j].outputBuffer[0]); } /* deninittalize kernels and programs */ for (int k = 0 ; k < 1 ; k++) { status = clReleaseKernel(platforms[0].kernels[k]); status = clReleaseProgram(platforms[0].program[k]); } /* deninittalize commands queue */ for (int j = 0 ; j < (int)num_devices ; j++) status = clReleaseCommandQueue(platforms[0].devices[j].commandQueue); /* deninittalize platform context */ status = clReleaseContext(platforms[0].context); }

        • Multiple devices in one context generate system GUI freeze in debug mode
          MicahVillmow
          ibird,
          It seems you might have found another way to reproduce the issue from this thread.
          http://forums.amd.com/forum/me...id=390&threadid=122767
          • Multiple devices in one context generate system GUI freeze in debug mode
            genaganna

             

            Originally posted by: ibird To divide some work into pieces and execute them in more than one device from one or more platforms , i has created some classes that find all platforms and devices inside them, then i create ONE context for ALL devices into every platform selected, then i compile the kernels for all devices.

            The classes call the kernel several consecutive times , one for each piece of work , and the results are good with no crashes or errors. But in debugging mode i get a lot of freeze system, and debugging the code in this condition is a pain

            So i has tried to find where is the problem, and i has extrapolated a simple code to reproduce it.

            compile the program with

            g++ -g -c main.cpp g++ -lOpenCL main.o -o main

            Open the binary with gdb or ddd (i use ddd), put a breakpoint into line 188 and run the program until it break, from the breakpoint do single step until the end of the program, except for the mouse pointer that continue to move, it should freeze all the system GUI into one of the clReleaseMemory calls.

            This problem disappear if i create ONE context for EACH device ( code not included )

            The problem is into my implementation/code or is a bug ?

            I am on linux ubuntu 9.04

            I will thanks if someone can test the code, and give me a feedback or an answer to the question

            ( The code assume that, all OpenCL calls SUCCESS and #define ATI_PLATFORM point to the number of the ATI Platform in the list )

            The code do not do nothing, find the platforms an select (manualy) the ATI platform, create a context with all devices into the platform (CPU and GPU), compile a kernel, create a command queue for each device, create 7 buffers, and delete the buffers, and release all.

            ibird,

            I am able to run your attached code without any problem with ddd. My system details are as follows

                OS : OpenSUSE 10.3 64 bit

                DDD : GNU DDD 3.3.11 (x86_64-suse-linux-gnu)

                CPU : Phenom Quad-core

                GPU : Juniper

                OpenCL SDK : ati-stream-sdk-v2.0-lnx64

                Catalyst : 9.12 Hotfix

            Could you please give us your system details like above? Do you have more than one GPU?