cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

hduregger
Journeyman III

Kernel with image2d_t is not being executed on CPU

Everything works fine on the GPU. But on the CPU the kernel is not being executed at all. All other kernels, not using images (but OpenGL shared buffer textures), in the pipeline of my application seem to execute fine on the CPU.

I check for execution with printf. Also with gdb the program would not run into the break point of the kernel  (a different kernel break point was hit successfully, so the debugging procedure should be correct). But the kernel function exists.

The kernel uses a "write_only image2d_t" as argument. Even if the image2d_t is not being accessed, the kernel does not get executed when running the application with an CPU OpenCL context. When commenting out the kernel argument assignment in the client code and also the kernel arguments in the kernel, the kernel executes on the CPU (and the printf message is printed).

The image passed as kernel argument is a texture 2D shared from OpenGL. The memory is getting acquired correctly.

Sorry for pasting code like this, the javascript doesn't seem to work correctly here.


    cl_mem clImage01 = image01->getImageShared();
    cl_mem clImage23 = image23->getImageShared();

    CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &clImage01, 0, NULL, NULL) );
    CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &clImage23, 0, NULL, NULL) );
    CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &sharedPotentialBuffer, 0, NULL, NULL) );

    CL_ERROR( clSetKernelArg(kernelGradient, 1, sizeof(cl_mem), &sharedPotentialBuffer ) );
    CL_ERROR( clSetKernelArg(kernelGradient, 2, sizeof(cl_mem), &clImage01 ) );
    CL_ERROR( clSetKernelArg(kernelGradient, 3, sizeof(cl_mem), &clImage23 ) );

    CL_PROFILE(kernelGradientProfilingId, isProfiling, queue,
            CL_ERROR( clEnqueueNDRangeKernel(queue, kernelGradient, 1, NULL, &threadCount, &workGroupSize, 0, NULL, &event) )
    );

    CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &sharedPotentialBuffer, 0, NULL, NULL) );
    CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &clImage01, 0, NULL, NULL) );
    CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &clImage23, 0, NULL, NULL) );



#pragma OPENCL EXTENSION cl_amd_printf : enable
...
kernel void gradient(const      uint      BUFFER_SIDE_LENGTH,
                     global     float4*   potentialIn,
                     write_only image2d_t gradientOut01,
                     write_only image2d_t gradientOut23)
{
...
printf("test\n");
}


Radeon 6950, Ubuntu 10.10, Catalyst 11.5, APP SDK 2.4, 64bit

EDIT: oopsa, forgot the question.


Any idea what is going on?

0 Likes
17 Replies
hduregger
Journeyman III

(oh my, working on Saturdays is not a good idea...)

CPU is an AMD Phenom II X4 920

0 Likes

Are you able to run simpleGL sample of SDK? Also post the clIfo output.

0 Likes

./samples/opencl/bin/x86_64/SimpleGL --device cpu
runs fine.

Number of platforms:                 1
  Platform Profile:                 FULL_PROFILE
  Platform Version:                 OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:                 Advanced Micro Devices, Inc.
  Platform Extensions:                 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices


  Platform Name:                 AMD Accelerated Parallel Processing
Number of devices:                 2
  Device Type:                     CL_DEVICE_TYPE_GPU
...

  Device Type:                     CL_DEVICE_TYPE_CPU
  Device ID:                     4098
  Max compute units:                 4
  Max work items dimensions:             3
    Max work items[0]:                 1024
    Max work items[1]:                 1024
    Max work items[2]:                 1024
  Max work group size:                 1024
  Preferred vector width char:             16
  Preferred vector width short:             8
  Preferred vector width int:             4
  Preferred vector width long:             2
  Preferred vector width float:             4
  Preferred vector width double:         0
  Native vector width char:             16
  Native vector width short:             8
  Native vector width int:             4
  Native vector width long:             2
  Native vector width float:             4
  Native vector width double:             0
  Max clock frequency:                 2800Mhz
  Address bits:                     64
  Max memory allocation:             2147483648
  Image support:                 Yes
  Max number of images read arguments:         128
  Max number of images write arguments:         8
  Max image 2D width:                 8192
  Max image 2D height:                 8192
  Max image 3D width:                 2048
  Max image 3D height:                 2048
  Max image 3D depth:                 2048
  Max samplers within kernel:             16
  Max size of kernel argument:             4096
  Alignment (bits) of base address:         1024
  Minimum alignment (bytes) for any datatype:     128
  Single precision floating point capability
    Denorms:                     Yes
    Quiet NaNs:                     Yes
    Round to nearest even:             Yes
    Round to zero:                 Yes
    Round to +ve and infinity:             Yes
    IEEE754-2008 fused multiply-add:         No
  Cache type:                     Read/Write
  Cache line size:                 64
  Cache size:                     65536
  Global memory size:                 4154441728
  Constant buffer size:                 65536
  Max number of constant args:             8
  Local memory type:                 Global
  Local memory size:                 32768
  Kernel Preferred work group size multiple:     1
  Error correction support:             0
  Unified memory for Host and Device:         1
  Profiling timer resolution:             1
  Device endianess:                 Little
  Available:                     Yes
  Compiler available:                 Yes
  Execution capabilities:                
    Execute OpenCL kernels:             Yes
    Execute native function:             Yes
  Queue properties:                
    Out-of-Order:                 No
    Profiling :                     Yes
  Platform ID:                     0x7f6f4745a800
  Name:                         AMD Phenom(tm) II X4 920 Processor
  Vendor:                     AuthenticAMD
  Driver version:                 2.0
  Profile:                     FULL_PROFILE
  Version:                     OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)

  Extensions:                     cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf



I even do explicitly check for the image support for RGBA CL_FLOAT which reports that it is supported on the CPU.

0 Likes

The image/texture is created like this

 

glGenTextures(1, &texture); GL_ERROR; glBindTexture(GL_TEXTURE_2D, texture); GL_ERROR; glTexParameteri(target, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(target, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameteri(target, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(target, GL_TEXTURE_MIN_FILTER, GL_LINEAR); GL_ERROR; glTexImage2D(target, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, NULL); GL_ERROR; image = clCreateFromGLTexture2D(context, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, texture, &status); CL_ERROR(status);

0 Likes

As per your clInfo output your CPU supports images.

I have some doubt that maybe the printf is having issues( there are a few cases reported for printf not working properly). Can you confirm that the kernel does not get executed.

 

0 Likes

Case 1:

  • not using printf at all and also not including the line to enable the use of the extension for printf
  • the kernel is not being executed.
  • I can see this by rendering the textures with OpenGL, they both contain random data.
  • When debugging with gdb the break point at __OpenCL_gradient_kernel is never hit.

Case 2:

  • As soon as I comment out the kernel image arguments (write_only image2d_t...) and the lines writing to the images (write_imagef) and also the setting of the arguments in the client code (clSetKernelArg)
  • __OpenCL_gradient_kernel can be hit.
  • I can also step through the kernel in that case.

This is really weird, somehow the runtime seems to ignore the kernel as soon as it uses image2d_t on the CPU. It puts up the __OpenCL_gradient_kernel, but never calls it...

0 Likes

It looks like a bug. Can you please post some testcase to reproduce it at our end.

0 Likes

I've uploaded the source of a simple test application that allows to reproduce the issue here

http://www.filesonic.com/file/1209760124/OpenCL-image-CPU.tar.gz

(sorry for the file hoster link but I think it is more practical that way than pasting the files in text here. I'm not even sure if we can paste several code attachments. Attaching files would also be a nice forum feature.)

Please read the README.txt it describes the steps to reproduce.

 

 

0 Likes
niello
Journeyman III

I have the same problem.
I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing).
The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors.

With buffer similar code works.



This a bug?



P.S. - Writing in Delphi.

Sorry for my English.

//executing Size [0]: = Width; Size [1]: = Height; FStatus: = clEnqueueNDRangeKernel (CommandQueue, Kernel, 2, nil, @ Size, nil, 0, nil, nil); //OpenCL kernel: __constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; __kernel void simple_image (read_only image2d_t src_image, write_only image2d_t dst_image) { int2 coord = (int2) (get_global_id (0), get_global_id (1)); uint4 pixel = read_imageui (src_image, sampler, coord); write_imageui (dst_image, coord, pixel); }

0 Likes

If I read this correct, I suspect I also experience a similar problem.

When using a GL shared object (as image2d_t) if I pass this memory object as an argument to a kernel (that writes to this image) and try to execute on the CPU device the kernel appears not to execute. I've never bothered debugging this, but the image is not written to and the kernel (which should take about 0.5sec to execute) appears to have almost zero execution time. However everything works fine when running on the GPU and I get the correct output results. Importantly, it did work fine on the CPU before I added GL interop.

0 Likes

Originally posted by: antzrhere If I read this correct, I suspect I also experience a similar problem.

 

When using a GL shared object (as image2d_t) if I pass this memory object as an argument to a kernel (that writes to this image) and try to execute on the CPU device the kernel appears not to execute. I've never bothered debugging this, but the image is not written to and the kernel (which should take about 0.5sec to execute) appears to have almost zero execution time. However everything works fine when running on the GPU and I get the correct output results. Importantly, it did work fine on the CPU before I added GL interop.

 

Could you please copy your code here?

0 Likes

I can confirm that none of my kernels which contain write only image2d images are getting executed on the cpu. Even empty kernels with printfs are not executed.

Notes:

I get CL_SUCCESS for compilation as well as execution.

I am using OpenGL interop as well.

I have observed this problem on both an AMD Phenom X6 and an Intel Core 2 Duo and a Core i5 when using the AMD OpenCL driver.

I CAN run the exact same code successfully using the Intel OpenCL driver (on the intel CPUs obviously) as well as on Nvidia GPUs and AMD GPUs.

Kernels using VBO opengl sharing work correctly.

 

0 Likes

Originally posted by: niello I have the same problem. I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing). The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors. With buffer similar code works.

 

This a bug?

 

P.S. - Writing in Delphi.

 

Sorry for my English.

 



Niello,

There is a SimpleImage sample in SDK which is using same kernel. SimpleImage is working fine.  There could a problem with Delphi code. Please contact Delphi wrapper developer.

0 Likes

Originally posted by: genaganna
Originally posted by: niello I have the same problem. I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing). The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors. With buffer similar code works.  

 

This a bug?  

 

P.S. - Writing in Delphi.  

 

Sorry for my English.  

 

 

 

Niello,

 

There is a SimpleImage sample in SDK which is using same kernel. SimpleImage is working fine.  There could a problem with Delphi code. Please contact Delphi wrapper developer.

 



I create headers for Delphi (http://code.google.com/p/delphi-opencl/ - the old version, create an OOP and OpenCL 1.1). The code is not such as to the SDK (based on OOP), am I looking at the difference. But not until I tested only on the CPU.
Accomplish your goal as soon as understand what was happening.

0 Likes

I believe the problem lays within the opengl/opencl sharing context. I am trying to share textures/images between the apis and for example gdb can't read the image data after clcreatefromgltexture...of the opengl context or the opencl context neither. If I create an image with clcreateimage... everything works fine.

 

EDIT: Forget what I said. Its a problem wioth the gdebugger vs 2010 plugin. The standalone version tells me my image looks what it should like. And drawing it onto a quad confirms that. Sorry.

0 Likes

So back to the original problem does anyone from AMD have an idea why this is occuring?

 

Is anyone using OpenGL Interop and an image2d write only kernel with a CPU device successfully?

0 Likes

Some free time and I found the cause of arising at home.
I used CL_UNORM_INT8 + CL_INTENSITY, but did not notice that it is probably not supported by the (now tried CL_RGBA + CL_UNSIGNED_INT8 - running). (I have not used GL_SHARING)
I will continue to read the specification.
Thank you and sorry for your time.
Powered by Delphi)

0 Likes