4 Replies Latest reply on Mar 24, 2010 3:17 PM by omkaranathan

    Limited kernel size/number and atomics extension problems

    purzel42

      Hi,

      I am currently working on a research project where we use OpenCL on ATI cards (Radeon HD 5770), currently with ATI Stream SDK 2.01.

      I experienced two strange behaviours:

      1. Trying to build a .cl file with 8 (or more) kernels resulted in an error from the OpenCL compiler (error code -2, CL_DEVICE_NOT_AVAILABLE) which doesn't make a lot of sense to me.

      I solved this by splitting my .cl file to two with 4 kernels each and build each .cl file to a dedicated cl_program object.

      Is there a limit to the number of kernels or the total size of a cl_program object?

       

      2. We use local atomic operations (atom_add) and therefor included the appropriate pragma cl_khr_local_int32_base_atomics. However, this resulted in a compiler error (error: can't find an instance for opencl builtin ... atom_add).

      This could be solved by changing the pragma to cl_khr_global_int32_base_atomics. Any ideas about this? Especially as the HD 5770 supports local atomics.

       

      Thanks for answers in advance.

      Felix

        • Limited kernel size/number and atomics extension problems
          omkaranathan

           

          I am currently working on a research project where we use OpenCL on ATI cards (Radeon HD 5770), currently with ATI Stream SDK 2.01.

           

          I experienced two strange behaviours:

           

          1. Trying to build a .cl file with 8 (or more) kernels resulted in an error from the OpenCL compiler (error code -2, CL_DEVICE_NOT_AVAILABLE) which doesn't make a lot of sense to me.

           

          I solved this by splitting my .cl file to two with 4 kernels each and build each .cl file to a dedicated cl_program object.

           

          Is there a limit to the number of kernels or the total size of a cl_program object?

           

          There is no such limit. Is it possible to post your code(host and kernel), so that I can try and reproduce the issue?

           

          2. We use local atomic operations (atom_add) and therefor included the appropriate pragma cl_khr_local_int32_base_atomics. However, this resulted in a compiler error (error: can't find an instance for opencl builtin ... atom_add).

           

          This could be solved by changing the pragma to cl_khr_global_int32_base_atomics. Any ideas about this? Especially as the HD 5770 supports local atomics.

           

           

           

          Thanks for answers in advance.

           

          Felix

           

          Does device quesry list the extension?

          If yes, could you post the code?

            • Limited kernel size/number and atomics extension problems
              purzel42

               

              Originally posted by: omkaranathan

               

              There is no such limit. Is it possible to post your code(host and kernel), so that I can try and reproduce the issue?

               

               

               

              Does device quesry list the extension?

               

              If yes, could you post the code?

               

               

              Hi,

              I attached the code to reproduce the error (we use Linux 64bit, compile with  g++).

              When I remove the float/double constants,

              Build returned error: Internal error: Link failed.
              Make sure the system setup is correct.

              is shown instead of the warning.

              However, removing the foo functions results in a correct compilation.

               

              Regarding the second question, I may have to admit it was our fault. We wrongly used _local_ for a __global parameter which worked fine on one platform but wouldn't on the other. _global_ seems to be the correct pragma.

               

              regards

              Felix

              #include <CL/cl.h> #include <fstream> #include <iostream> cl_device_id simDevice; cl_context simContext; cl_command_queue simCmdQueue; cl_program simProgram; cl_platform_id simPlatform; int loadKernelCode(const char *filename, char **code, size_t *codeSize) { std::ifstream file_stream; file_stream.open(filename, std::ios::in); if (!file_stream.is_open()) return -1; file_stream.seekg(0, std::ios_base::end); *codeSize = file_stream.tellg(); file_stream.seekg(0, std::ios_base::beg); *code = new char[*codeSize]; file_stream.read(*code, *codeSize); file_stream.close(); return 0; } void setOpenCLDevice() { cl_int err = CL_SUCCESS; cl_uint num_platforms = 0; clGetPlatformIDs(0, NULL, &num_platforms); cl_platform_id platforms[num_platforms]; clGetPlatformIDs(num_platforms, platforms, NULL); cl_uint num_devices = 0; simPlatform = platforms[0]; clGetDeviceIDs(simPlatform, CL_DEVICE_TYPE_GPU, 0, NULL, &num_devices); cl_device_id devices[num_devices]; clGetDeviceIDs(simPlatform, CL_DEVICE_TYPE_GPU, num_devices, devices, NULL); simDevice = devices[0]; } void setupOpenCLEnvironment(void) { cl_int err = CL_SUCCESS; simContext = clCreateContext(NULL, 1, &(simDevice), NULL, NULL, &err); simCmdQueue = clCreateCommandQueue(simContext, simDevice, CL_QUEUE_PROFILING_ENABLE, &err); } void buildOpenCLKernels(void) { cl_int err = CL_SUCCESS; const size_t num_kernel_files = 1; const char *kernel_files[num_kernel_files] = {"forum_test.cl"}; char *kernel_code[num_kernel_files]; size_t kernel_code_size[num_kernel_files]; int result = 0; for (size_t i = 0; i < num_kernel_files; i++) { kernel_code[i] = NULL; kernel_code_size[i] = 0; result = loadKernelCode(kernel_files[i], &(kernel_code[i]), &(kernel_code_size[i])); if (result != 0) break; } if (result != 0) { for (size_t i = 0; i < num_kernel_files; i++) if (kernel_code[i] != NULL) delete[] kernel_code[i]; } simProgram = clCreateProgramWithSource(simContext, num_kernel_files, (const char**) kernel_code, kernel_code_size, &err); for (size_t i = 0; i < num_kernel_files; i++) if (kernel_code[i] != NULL) delete[] kernel_code[i]; if (err != CL_SUCCESS) return; err = clBuildProgram(simProgram, 1, &(simDevice), NULL, NULL, NULL); if (err != CL_SUCCESS) { size_t logSize = 0; err = clGetProgramBuildInfo(simProgram, simDevice, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize); char *buildLog = new char[logSize + 1]; err |= clGetProgramBuildInfo(simProgram, simDevice, CL_PROGRAM_BUILD_LOG, logSize, buildLog, NULL); if (err == CL_SUCCESS) std::cerr << "Build returned error: " << buildLog << std::endl; else std::cerr << "Build returned error.\n" << std::endl; } cl_build_status status; clGetProgramBuildInfo(simProgram, simDevice, CL_PROGRAM_BUILD_STATUS, sizeof (cl_build_status), &status, NULL); if (status != 0) std::cerr << "build status: " << status << std::endl; } void cleanup() { clReleaseProgram(simProgram); if (simCmdQueue != NULL) clReleaseCommandQueue(simCmdQueue); if (simContext != NULL) clReleaseContext(simContext); } int main(int argc, char **argv) { setOpenCLDevice(); setupOpenCLEnvironment(); buildOpenCLKernels(); cleanup(); return 0; } //**************** // .cl file //**************** __kernel void foo1() {} __kernel void foo2() {} __kernel void foo3() {} __kernel void foo4() {} __kernel void foo5() {} __kernel void foo6() {} float2 foo7(const float2 Mom, const float m0_2) { const float rc2 = (float) (8.854188e-12); float2 result; result.x = 1.0f; result.y = rc2; return result; } __kernel void foo8(const int igx, const int igy, __global int* lpic, __global int* cc, __global int* ppic) { int2 Cell; Cell.x = (get_group_id(0) << 4) + get_local_id(0); Cell.y = (get_group_id(1) << 4) + get_local_id(1); int iCell = Cell.y * igx + Cell.x; int ilp = lpic[iCell]; int iPar = ilp; int iPrev = ppic[iPar]; while (iPrev != -1) { if (cc[iPrev]) { iPrev = ppic[iPrev]; } else { ppic[iPar] = iPrev; iPar = iPrev; iPrev = ppic[iPar]; } } ppic[iPar] = -1; }

              • Limited kernel size/number and atomics extension problems
                purzel42

                 

                Originally posted by: omkaranathan

                 

                Does device quesry list the extension?

                 

                If yes, could you post the code?

                 

                 

                btw: yes, it is listed in device query.