Everything works fine on the GPU. But on the CPU the kernel is not being executed at all. All other kernels, not using images (but OpenGL shared buffer textures), in the pipeline of my application seem to execute fine on the CPU.
I check for execution with printf. Also with gdb the program would not run into the break point of the kernel (a different kernel break point was hit successfully, so the debugging procedure should be correct). But the kernel function exists.
The kernel uses a "write_only image2d_t" as argument. Even if the image2d_t is not being accessed, the kernel does not get executed when running the application with an CPU OpenCL context. When commenting out the kernel argument assignment in the client code and also the kernel arguments in the kernel, the kernel executes on the CPU (and the printf message is printed).
The image passed as kernel argument is a texture 2D shared from OpenGL. The memory is getting acquired correctly.
Sorry for pasting code like this, the javascript doesn't seem to work correctly here.
cl_mem clImage01 = image01->getImageShared();
cl_mem clImage23 = image23->getImageShared();
CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &clImage01, 0, NULL, NULL) );
CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &clImage23, 0, NULL, NULL) );
CL_ERROR( clEnqueueAcquireGLObjects(queue, 1, &sharedPotentialBuffer, 0, NULL, NULL) );
CL_ERROR( clSetKernelArg(kernelGradient, 1, sizeof(cl_mem), &sharedPotentialBuffer ) );
CL_ERROR( clSetKernelArg(kernelGradient, 2, sizeof(cl_mem), &clImage01 ) );
CL_ERROR( clSetKernelArg(kernelGradient, 3, sizeof(cl_mem), &clImage23 ) );
CL_PROFILE(kernelGradientProfilingId, isProfiling, queue,
CL_ERROR( clEnqueueNDRangeKernel(queue, kernelGradient, 1, NULL, &threadCount, &workGroupSize, 0, NULL, &event) )
);
CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &sharedPotentialBuffer, 0, NULL, NULL) );
CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &clImage01, 0, NULL, NULL) );
CL_ERROR( clEnqueueReleaseGLObjects(queue, 1, &clImage23, 0, NULL, NULL) );
#pragma OPENCL EXTENSION cl_amd_printf : enable
...
kernel void gradient(const uint BUFFER_SIDE_LENGTH,
global float4* potentialIn,
write_only image2d_t gradientOut01,
write_only image2d_t gradientOut23)
{
...
printf("test\n");
}
Radeon 6950, Ubuntu 10.10, Catalyst 11.5, APP SDK 2.4, 64bit
EDIT: oopsa, forgot the question.
Any idea what is going on?
(oh my, working on Saturdays is not a good idea...)
CPU is an AMD Phenom II X4 920
Are you able to run simpleGL sample of SDK? Also post the clIfo output.
runs fine.
./samples/opencl/bin/x86_64/SimpleGL --device cpu
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
...
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 4
Native vector width double: 0
Max clock frequency: 2800Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 4154441728
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f6f4745a800
Name: AMD Phenom(tm) II X4 920 Processor
Vendor: AuthenticAMD
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 AMD-APP-SDK-v2.4 (595.10)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_media_ops cl_amd_popcnt cl_amd_printf
The image/texture is created like this
glGenTextures(1, &texture); GL_ERROR; glBindTexture(GL_TEXTURE_2D, texture); GL_ERROR; glTexParameteri(target, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(target, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameteri(target, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(target, GL_TEXTURE_MIN_FILTER, GL_LINEAR); GL_ERROR; glTexImage2D(target, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT, NULL); GL_ERROR; image = clCreateFromGLTexture2D(context, CL_MEM_READ_WRITE, GL_TEXTURE_2D, 0, texture, &status); CL_ERROR(status);
As per your clInfo output your CPU supports images.
I have some doubt that maybe the printf is having issues( there are a few cases reported for printf not working properly). Can you confirm that the kernel does not get executed.
Case 1:
Case 2:
This is really weird, somehow the runtime seems to ignore the kernel as soon as it uses image2d_t on the CPU. It puts up the __OpenCL_gradient_kernel, but never calls it...
It looks like a bug. Can you please post some testcase to reproduce it at our end.
I've uploaded the source of a simple test application that allows to reproduce the issue here
http://www.filesonic.com/file/1209760124/OpenCL-image-CPU.tar.gz
(sorry for the file hoster link but I think it is more practical that way than pasting the files in text here. I'm not even sure if we can paste several code attachments. Attaching files would also be a nice forum feature.)
Please read the README.txt it describes the steps to reproduce.
I have the same problem.
I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing).
The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors.
With buffer similar code works.
This a bug?
P.S. - Writing in Delphi.
Sorry for my English.
//executing Size [0]: = Width; Size [1]: = Height; FStatus: = clEnqueueNDRangeKernel (CommandQueue, Kernel, 2, nil, @ Size, nil, 0, nil, nil); //OpenCL kernel: __constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST; __kernel void simple_image (read_only image2d_t src_image, write_only image2d_t dst_image) { int2 coord = (int2) (get_global_id (0), get_global_id (1)); uint4 pixel = read_imageui (src_image, sampler, coord); write_imageui (dst_image, coord, pixel); }
If I read this correct, I suspect I also experience a similar problem.
When using a GL shared object (as image2d_t) if I pass this memory object as an argument to a kernel (that writes to this image) and try to execute on the CPU device the kernel appears not to execute. I've never bothered debugging this, but the image is not written to and the kernel (which should take about 0.5sec to execute) appears to have almost zero execution time. However everything works fine when running on the GPU and I get the correct output results. Importantly, it did work fine on the CPU before I added GL interop.
Originally posted by: antzrhere If I read this correct, I suspect I also experience a similar problem.
When using a GL shared object (as image2d_t) if I pass this memory object as an argument to a kernel (that writes to this image) and try to execute on the CPU device the kernel appears not to execute. I've never bothered debugging this, but the image is not written to and the kernel (which should take about 0.5sec to execute) appears to have almost zero execution time. However everything works fine when running on the GPU and I get the correct output results. Importantly, it did work fine on the CPU before I added GL interop.
Could you please copy your code here?
I can confirm that none of my kernels which contain write only image2d images are getting executed on the cpu. Even empty kernels with printfs are not executed.
Notes:
I get CL_SUCCESS for compilation as well as execution.
I am using OpenGL interop as well.
I have observed this problem on both an AMD Phenom X6 and an Intel Core 2 Duo and a Core i5 when using the AMD OpenCL driver.
I CAN run the exact same code successfully using the Intel OpenCL driver (on the intel CPUs obviously) as well as on Nvidia GPUs and AMD GPUs.
Kernels using VBO opengl sharing work correctly.
Originally posted by: niello I have the same problem. I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing). The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors. With buffer similar code works. This a bug?
P.S. - Writing in Delphi.
Sorry for my English.
Niello,
There is a SimpleImage sample in SDK which is using same kernel. SimpleImage is working fine. There could a problem with Delphi code. Please contact Delphi wrapper developer.
Originally posted by: genaganna Originally posted by: niello I have the same problem. I can not check on the GPU, but CPU kernel is not executed. (tested on 2.4, just reinstalled to 2.5 - same thing). The image is copied and read (I read something that probyval recorded). In the absence of a similar image is read clEnqueueNDRangeKernel as without (in gray black stripes - for debugging of the compiler when you run the compiled program - a few white dots on a gray background), then there is clEnqueueNDRangeKernel not satisfied. Memory in which write - reset to zero. OpenCL does not display errors. With buffer similar code works.This a bug?
P.S. - Writing in Delphi.
Sorry for my English.
Niello,
There is a SimpleImage sample in SDK which is using same kernel. SimpleImage is working fine. There could a problem with Delphi code. Please contact Delphi wrapper developer.
I create headers for Delphi (http://code.google.com/p/delphi-opencl/ - the old version, create an OOP and OpenCL 1.1). The code is not such as to the SDK (based on OOP), am I looking at the difference. But not until I tested only on the CPU.
Accomplish your goal as soon as understand what was happening.
I believe the problem lays within the opengl/opencl sharing context. I am trying to share textures/images between the apis and for example gdb can't read the image data after clcreatefromgltexture...of the opengl context or the opencl context neither. If I create an image with clcreateimage... everything works fine.
EDIT: Forget what I said. Its a problem wioth the gdebugger vs 2010 plugin. The standalone version tells me my image looks what it should like. And drawing it onto a quad confirms that. Sorry.
So back to the original problem does anyone from AMD have an idea why this is occuring?
Is anyone using OpenGL Interop and an image2d write only kernel with a CPU device successfully?
Some free time and I found the cause of arising at home.
I used CL_UNORM_INT8 + CL_INTENSITY, but did not notice that it is probably not supported by the (now tried CL_RGBA + CL_UNSIGNED_INT8 - running). (I have not used GL_SHARING)
I will continue to read the specification.
Thank you and sorry for your time.
Powered by Delphi)