Archives Discussions

ibird · ‎12-28-2009

To divide some work into pieces and execute them in more than one device from one or more platforms , i has created some classes that find all platforms and devices inside them, then i create ONE context for ALL devices into every platform selected, then i compile the kernels for all devices.

The classes call the kernel several consecutive times , one for each piece of work , and the results are good with no crashes or errors. But in debugging mode i get a lot of freeze system, and debugging the code in this condition is a pain

So i has tried to find where is the problem, and i has extrapolated a simple code to reproduce it.

compile the program with

g++ -g -c main.cpp
g++ -lOpenCL main.o -o main

Open the binary with gdb or ddd (i use ddd), put a breakpoint into line 188 and run the program until it break, from the breakpoint do single step until the end of the program, except for the mouse pointer that continue to move, it should freeze all the system GUI into one of the clReleaseMemory calls.

This problem disappear if i create ONE context for EACH device ( code not included )

The problem is into my implementation/code or is a bug ?

I am on linux ubuntu 9.04

I will thanks if someone can test the code, and give me a feedback or an answer to the question

( The code assume that, all OpenCL calls SUCCESS and #define ATI_PLATFORM point to the number of the ATI Platform in the list )

The code do not do nothing, find the platforms an select (manualy) the ATI platform, create a context with all devices into the platform (CPU and GPU), compile a kernel, create a command queue for each device, create 7 buffers, and delete the buffers, and release all.

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <iostream> #include <CL/cl.h> #include <vector> #define KERNEL_TEST 0 #define ATI_PLATFORM 0 char kernelsourcedot[] = "/*\n\ * kernel which do a scalar product for every row of the two input matrix\n\ * and stores it at the corresponding output array\n\ */\n\ \n\ \n\ __kernel void testkernel(__global float * output, __global float * input, const int width)\n\ {\n\ \n\ globID = get_global_id(0);\n\ \n\ output[globID] = 0;\n\ \n\ }"; struct pdevices { bool use; cl_device_id devID; cl_command_queue commandQueue; cl_device_type type; cl_mem inputBuffer[6]; cl_mem outputBuffer[1]; pdevices() {} ~pdevices() {} }; struct platform { bool use; cl_platform_id platID; cl_context context; cl_kernel kernels[1]; cl_program program[1]; /* one program one kernel */ pdevices devices[2]; }; platform platforms[1]; int main(int argc, char ** argv) { int width[6] = {1,2,4,1,2,4}; cl_int status = 0; cl_uint nplat = 0; cl_uint num_devices; /* Get all platforms and devices info */ /* Get number of platforms */ status = clGetPlatformIDs(0,NULL,&nplat); if(status != CL_SUCCESS) return 0; /* Get all platforms ID */ cl_platform_id * platformst = new cl_platform_id [nplat]; status = clGetPlatformIDs(nplat,platformst,NULL); if(status != CL_SUCCESS) return 0; /* for each platform get platform and devices info and store them */ platforms[0].platID = platformst[ATI_PLATFORM]; platforms[0].use = true; /* Get num of devices into a platform */ status = clGetDeviceIDs(platformst[0],CL_DEVICE_TYPE_ALL,0,NULL,&num_devices); if (status != CL_SUCCESS) return 0; /* Get all devices ID in a platform */ cl_device_id * devicest = new cl_device_id [num_devices]; status = clGetDeviceIDs(platformst[0],CL_DEVICE_TYPE_ALL,num_devices,devicest,NULL); if (status != CL_SUCCESS) return 0; /* for each device get device info and store them */ for (int j = 0 ; j < (int)num_devices ; j++) { platforms[0].devices.devID = devicest; platforms[0].devices.use = true; /* Get device name and vendor name , if an error occur do not use the device and continue */ status = clGetDeviceInfo(devicest,CL_DEVICE_TYPE,256,&platforms[0].devices.type,NULL); if (status != CL_SUCCESS) return 0; } delete [] devicest; delete [] platformst; /* for each platform create a context */ cl_context_properties prop[3]; prop[0] = CL_CONTEXT_PLATFORM; prop[1] = (cl_context_properties)platforms[0].platID; prop[2] = 0; ////////////// More than one device in context //////////////// cl_device_id * devid = new cl_device_id [num_devices]; for (int j = 0 ; j < (int)num_devices ; j++) { devid = platforms[0].devices.devID; } platforms[0].context = clCreateContext(prop, num_devices, devid, NULL, NULL ,&status); if(status != CL_SUCCESS) return 0; delete [] devid; /* for each devices in a platform create a command Queue */ for (int j = 0 ; j < (int)num_devices ; j++) { platforms[0].devices.commandQueue = clCreateCommandQueue(platforms[0].context, platforms[0].devices.devID, 0, &status); if(status != CL_SUCCESS) return 0; } /* Sources loading */ size_t sourceSize = sizeof(kernelsourcedot)-1; const char * source = kernelsourcedot; platforms[0].program[KERNEL_TEST] = clCreateProgramWithSource(platforms[0].context, 1, &source, &sourceSize, &status); /* for each devices build the CL program */ /* Build KERNEL_SCALARPRODUCT */ status = clBuildProgram(platforms[0].program[KERNEL_TEST], 0, NULL, NULL, NULL, NULL); /* get a kernel object for a kernel with the given name, if an error occur Kernel is not used * we will check the kernel NULL value later when try to use it */ platforms[0].kernels[KERNEL_TEST] = clCreateKernel(platforms[0].program[KERNEL_TEST], "testkernel", &status); for (int j = 0 ; j < (int)num_devices ; j++) { /* the device must be active, of selected type */ if (platforms[0].devices.type != CL_DEVICE_TYPE_CPU) continue; platforms[0].devices.inputBuffer[0] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[0] * 3, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices.inputBuffer[1] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[1] * 4, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices.inputBuffer[2] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[2], NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices.inputBuffer[3] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[3] * 3, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices.inputBuffer[4] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[4] * 4, NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices.inputBuffer[5] = clCreateBuffer(platforms[0].context, CL_MEM_READ_ONLY, sizeof(cl_int) * width[5], NULL, &status); if(status != CL_SUCCESS) return 0; platforms[0].devices.outputBuffer[0] = clCreateBuffer(platforms[0].context, CL_MEM_WRITE_ONLY, sizeof(cl_int) * width[3], NULL, &status); if(status != CL_SUCCESS) return 0; } for (int j = 0 ; j < (int)num_devices ; j++) { if (platforms[0].devices.type != CL_DEVICE_TYPE_CPU) continue; status = clReleaseMemObject(platforms[0].devices.inputBuffer[0]); status = clReleaseMemObject(platforms[0].devices.inputBuffer[1]); status = clReleaseMemObject(platforms[0].devices.inputBuffer[2]); status = clReleaseMemObject(platforms[0].devices.inputBuffer[3]); status = clReleaseMemObject(platforms[0].devices.inputBuffer[4]); status = clReleaseMemObject(platforms[0].devices.inputBuffer[5]); status = clReleaseMemObject(platforms[0].devices.outputBuffer[0]); } /* deninittalize kernels and programs */ for (int k = 0 ; k < 1 ; k++) { status = clReleaseKernel(platforms[0].kernels); status = clReleaseProgram(platforms[0].program); } /* deninittalize commands queue */ for (int j = 0 ; j < (int)num_devices ; j++) status = clReleaseCommandQueue(platforms[0].devices.commandQueue); /* deninittalize platform context */ status = clReleaseContext(platforms[0].context); }

MicahVillmow · ‎12-28-2009

ibird,
It seems you might have found another way to reproduce the issue from this thread.
http://forums.amd.com/forum/me...id=390&threadid=122767

ibird · ‎12-28-2009

Maybe , but the freeze is different.

In the first topic it freeze GUI with mouse pointer included, in this second topic the freeze does not include the mouse pointer

genaganna · ‎12-29-2009

Originally posted by: ibird To divide some work into pieces and execute them in more than one device from one or more platforms , i has created some classes that find all platforms and devices inside them, then i create ONE context for ALL devices into every platform selected, then i compile the kernels for all devices.

The classes call the kernel several consecutive times , one for each piece of work , and the results are good with no crashes or errors. But in debugging mode i get a lot of freeze system, and debugging the code in this condition is a pain
So i has tried to find where is the problem, and i has extrapolated a simple code to reproduce it.
compile the program with
g++ -g -c main.cpp g++ -lOpenCL main.o -o main
Open the binary with gdb or ddd (i use ddd), put a breakpoint into line 188 and run the program until it break, from the breakpoint do single step until the end of the program, except for the mouse pointer that continue to move, it should freeze all the system GUI into one of the clReleaseMemory calls.
This problem disappear if i create ONE context for EACH device ( code not included )
The problem is into my implementation/code or is a bug ?
I am on linux ubuntu 9.04
I will thanks if someone can test the code, and give me a feedback or an answer to the question
( The code assume that, all OpenCL calls SUCCESS and #define ATI_PLATFORM point to the number of the ATI Platform in the list )
The code do not do nothing, find the platforms an select (manualy) the ATI platform, create a context with all devices into the platform (CPU and GPU), compile a kernel, create a command queue for each device, create 7 buffers, and delete the buffers, and release all.

ibird,

I am able to run your attached code without any problem with ddd. My system details are as follows

OS : OpenSUSE 10.3 64 bit

DDD : GNU DDD 3.3.11 (x86_64-suse-linux-gnu)

CPU : Phenom Quad-core

GPU : Juniper

OpenCL SDK : ati-stream-sdk-v2.0-lnx64

Catalyst : 9.12 Hotfix

Could you please give us your system details like above? Do you have more than one GPU?

ibird · ‎12-29-2009

OS : Linux ubuntu 9.04 2.6.28-17-generic #58-Ubuntu SMP i686 GNU/Linux

DDD :GNU DDD 3.3.11 (i486-pc-linux-gnu)

CPU :Intel Q8200

GPU :Ati 4870 RV770

OpenCL SDK : ati-stream-sdk-v2.0-lnx32

Catalyst : fglrx 8.67, beta driver

I will test with 9.12 driver

ibird · ‎12-29-2009

Tested with 9.12 Hotfix fglrx 8.682.2 , it freeze

omkaranathan · ‎01-12-2010

ibird,

I'm able to reproduce the issue.

Developers are looking into it, thanks for reporting.

Archives Discussions

Multiple devices in one context generate system GUI freeze in debug mode