17 Replies Latest reply on Dec 8, 2011 11:22 AM by niello

    Kernel with image2d_t is not being executed on CPU

    hduregger

      Everything works fine on the GPU. But on the CPU the kernel is not being executed at all. All other kernels, not using images (but OpenGL shared buffer textures), in the pipeline of my application seem to execute fine on the CPU.

      I check for execution with printf. Also with gdb the program would not run into the break point of the kernel  (a different kernel break point was hit successfully, so the debugging procedure should be correct). But the kernel function exists.

      The kernel uses a "write_only image2d_t" as argument. Even if the image2d_t is not being accessed, the kernel does not get executed when running the application with an CPU OpenCL context. When commenting out the kernel argument assignment in the client code and also the kernel arguments in the kernel, the kernel executes on the CPU (and the printf message is printed).

      The image passed as kernel argument is a texture 2D shared from OpenGL. The memory is getting acquired correctly.

      Sorry for pasting code like this, the javascript doesn't seem to work correctly here.

       


          cl_mem clImage01 = image01->getImageShared();
          cl_mem clImage23 = image23->getImageShared();

          CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &clImage01, 0, NULL, NULL) );
          CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &clImage23, 0, NULL, NULL) );
          CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &sharedPotentialBuffer, 0, NULL, NULL) );

          CL_ERROR( clSetKernelArg(kernelGradient, 1, sizeof(cl_mem), &sharedPotentialBuffer ) );
          CL_ERROR( clSetKernelArg(kernelGradient, 2, sizeof(cl_mem), &clImage01 ) );
          CL_ERROR( clSetKernelArg(kernelGradient, 3, sizeof(cl_mem), &clImage23 ) );

          CL_PROFILE(kernelGradientProfilingId, isProfiling, queue,
                  CL_ERROR( clEnqueueNDRangeKernel(queue, kernelGradient, 1, NULL, &threadCount, &workGroupSize, 0, NULL, &event) )
          );

          CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &sharedPotentialBuffer, 0, NULL, NULL) );
          CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &clImage01, 0, NULL, NULL) );
          CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &clImage23, 0, NULL, NULL) );



      #pragma OPENCL EXTENSION cl_amd_printf : enable
      ...
      kernel void gradient(const      uint      BUFFER_SIDE_LENGTH,
                           global     float4*   potentialIn,
                           write_only image2d_t gradientOut01,
                           write_only image2d_t gradientOut23)
      {
      ...
      printf("test\n");
      }


      Radeon 6950, Ubuntu 10.10, Catalyst 11.5, APP SDK 2.4, 64bit

      EDIT: oopsa, forgot the question.


      Any idea what is going on?

        • Kernel with image2d_t is not being executed on CPU
          hduregger

          (oh my, working on Saturdays is not a good idea...)

          CPU is an AMD Phenom II X4 920

            • Kernel with image2d_t is not being executed on CPU
              himanshu.gautam

              Are you able to run simpleGL sample of SDK? Also post the clIfo output.

                • Kernel with image2d_t is not being executed on CPU
                  hduregger

                   

                  ./samples/opencl/bin/x86_64/SimpleGL --device cpu
                  runs fine.

                   

                  Number of platforms:                 1
                    Platform Profile:                 FULL_PROFILE
                    Platform Version:                 OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
                    Platform Name:                 AMD Accelerated Parallel Processing
                    Platform Vendor:                 Advanced Micro Devices, Inc.
                    Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


                    Platform Name:                 AMD Accelerated Parallel Processing
                  Number of devices:                 2
                    Device Type:                     CL_DEVICE_TYPE_GPU
                  ...

                    Device Type:                     CL_DEVICE_TYPE_CPU
                    Device ID:                     4098
                    Max compute units:                 4
                    Max work items dimensions:             3
                      Max work items[0]:                 1024
                      Max work items[1]:                 1024
                      Max work items[2]:                 1024
                    Max work group size:                 1024
                    Preferred vector width char:             16
                    Preferred vector width short:             8
                    Preferred vector width int:             4
                    Preferred vector width long:             2
                    Preferred vector width float:             4
                    Preferred vector width double:         0
                    Native vector width char:             16
                    Native vector width short:             8
                    Native vector width int:             4
                    Native vector width long:             2
                    Native vector width float:             4
                    Native vector width double:             0
                    Max clock frequency:                 2800Mhz
                    Address bits:                     64
                    Max memory allocation:             2147483648
                    Image support:                 Yes
                    Max number of images read arguments:         128
                    Max number of images write arguments:         8
                    Max image 2D width:                 8192
                    Max image 2D height:                 8192
                    Max image 3D width:                 2048
                    Max image 3D height:                 2048
                    Max image 3D depth:                 2048
                    Max samplers within kernel:             16
                    Max size of kernel argument:             4096
                    Alignment (bits) of base address:         1024
                    Minimum alignment (bytes) for any datatype:     128
                    Single precision floating point capability
                      Denorms:                     Yes
                      Quiet NaNs:                     Yes
                      Round to nearest even:             Yes
                      Round to zero:                 Yes
                      Round to +ve and infinity:             Yes
                      IEEE754-2008 fused multiply-add:         No
                    Cache type:                     Read/Write
                    Cache line size:                 64
                    Cache size:                     65536
                    Global memory size:                 4154441728
                    Constant buffer size:                 65536
                    Max number of constant args:             8
                    Local memory type:                 Global
                    Local memory size:                 32768
                    Kernel Preferred work group size multiple:     1
                    Error correction support:             0
                    Unified memory for Host and Device:         1
                    Profiling timer resolution:             1
                    Device endianess:                 Little
                    Available:                     Yes
                    Compiler available:                 Yes
                    Execution capabilities:                
                      Execute OpenCL kernels:             Yes
                      Execute native function:             Yes
                    Queue properties:                
                      Out-of-Order:                 No
                      Profiling :                     Yes
                    Platform ID:                     0x7f6f4745a800
                    Name:                         AMD Phenom(tm) II X4 920 Processor
                    Vendor:                     AuthenticAMD
                    Driver version:                 2.0
                    Profile:                     FULL_PROFILE
                    Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)

                    Extensions:                     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf



                  I even do explicitly check for the image support for RGBA CL_FLOAT which reports that it is supported on the CPU.

              • Kernel with image2d_t is not being executed on CPU
                niello

                I have the same problem.
                I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing).
                The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors.

                With buffer similar code works.



                This a bug?



                P.S. - Writing in Delphi.

                Sorry for my English.

                //executing Size [0]: = Width; Size [1]: = Height; FStatus: = clEnqueueNDRangeKernel (CommandQueue, Kernel, 2, nil, @ Size, nil, 0, nil, nil); //OpenCL kernel: __constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; __kernel void simple_image (read_only image2d_t src_image, write_only image2d_t dst_image) { int2 coord = (int2) (get_global_id (0), get_global_id (1)); uint4 pixel = read_imageui (src_image, sampler, coord); write_imageui (dst_image, coord, pixel); }

                  • Kernel with image2d_t is not being executed on CPU
                    antzrhere

                    If I read this correct, I suspect I also experience a similar problem.

                    When using a GL shared object (as image2d_t) if I pass this memory object as an argument to a kernel (that writes to this image) and try to execute on the CPU device the kernel appears not to execute. I've never bothered debugging this, but the image is not written to and the kernel (which should take about 0.5sec to execute) appears to have almost zero execution time. However everything works fine when running on the GPU and I get the correct output results. Importantly, it did work fine on the CPU before I added GL interop.

                      • Kernel with image2d_t is not being executed on CPU
                        genaganna

                         

                        Originally posted by: antzrhere If I read this correct, I suspect I also experience a similar problem.

                         

                        When using a GL shared object (as image2d_t) if I pass this memory object as an argument to a kernel (that writes to this image) and try to execute on the CPU device the kernel appears not to execute. I've never bothered debugging this, but the image is not written to and the kernel (which should take about 0.5sec to execute) appears to have almost zero execution time. However everything works fine when running on the GPU and I get the correct output results. Importantly, it did work fine on the CPU before I added GL interop.

                         

                        Could you please copy your code here?

                          • Kernel with image2d_t is not being executed on CPU
                            thesmileman

                            I can confirm that none of my kernels which contain write only image2d images are getting executed on the cpu. Even empty kernels with printfs are not executed.

                            Notes:

                            I get CL_SUCCESS for compilation as well as execution.

                            I am using OpenGL interop as well.

                            I have observed this problem on both an AMD Phenom X6 and an Intel Core 2 Duo and a Core i5 when using the AMD OpenCL driver.

                            I CAN run the exact same code successfully using the Intel OpenCL driver (on the intel CPUs obviously) as well as on Nvidia GPUs and AMD GPUs.

                            Kernels using VBO opengl sharing work correctly.

                             

                        • Kernel with image2d_t is not being executed on CPU
                          genaganna

                           

                          Originally posted by: niello I have the same problem. I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing). The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors. With buffer similar code works.

                           

                          This a bug?

                           

                          P.S. - Writing in Delphi.

                           

                          Sorry for my English.

                           



                          Niello,

                          There is a SimpleImage sample in SDK which is using same kernel. SimpleImage is working fine.  There could a problem with Delphi code. Please contact Delphi wrapper developer.

                            • Kernel with image2d_t is not being executed on CPU
                              niello

                               

                              Originally posted by: genaganna
                              Originally posted by: niello I have the same problem. I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing). The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors. With buffer similar code works.  

                               

                              This a bug?  

                               

                              P.S. - Writing in Delphi.  

                               

                              Sorry for my English.  

                               

                               

                               

                              Niello,

                               

                              There is a SimpleImage sample in SDK which is using same kernel. SimpleImage is working fine.  There could a problem with Delphi code. Please contact Delphi wrapper developer.

                               



                              I create headers for Delphi (http://code.google.com/p/delphi-opencl/ - the old version, create an OOP and OpenCL 1.1). The code is not such as to the SDK (based on OOP), am I looking at the difference. But not until I tested only on the CPU.
                              Accomplish your goal as soon as understand what was happening.

                                • Kernel with image2d_t is not being executed on CPU
                                  chevydevil

                                  I believe the problem lays within the opengl/opencl sharing context. I am trying to share textures/images between the apis and for example gdb can't read the image data after clcreatefromgltexture...of the opengl context or the opencl context neither. If I create an image with clcreateimage... everything works fine.

                                   

                                  EDIT: Forget what I said. Its a problem wioth the gdebugger vs 2010 plugin. The standalone version tells me my image looks what it should like. And drawing it onto a quad confirms that. Sorry.

                                  • Kernel with image2d_t is not being executed on CPU
                                    niello

                                    Some free time and I found the cause of arising at home.
                                    I used CL_UNORM_INT8 + CL_INTENSITY, but did not notice that it is probably not supported by the (now tried CL_RGBA + CL_UNSIGNED_INT8 - running). (I have not used GL_SHARING)
                                    I will continue to read the specification.
                                    Thank you and sorry for your time.
                                    Powered by Delphi)