Hello,
I fail to compile and run my OpenCL kernel with image buffers on the ATI 5870. Any known bugs ?
I am using OpenCL 2.01 and AMD desktop with Win7 64 bits.
Thanks, --Yariv
image are not support currently.
When is the image support implemented in OpenCL for GPUs?
When can we expect the release of this important feature?
Originally posted by: karls When is the image support implemented in OpenCL for GPUs?
When can we expect the release of this important feature?
we can't give an exact date but should be in the next few months.
I am attempting to emulate images , because I need to use Atomics to completely redesign my heuristic, which should achieve at least 1 order of magnitude in performance. My 8800GTX can no longer be used. My options are:
- Use my MacBookPro. Easy since Java is my host lang, but not a 30" display.
- Buy a new Nvidia. It's a weird time to buy Nvidia. Want to wait.
- Make use of my 4890.
Checking out the 4890 option, I am sharing how I wanted to emulate images. It works when you actually have images, but fails on Nvidia when you force it to believe it does not.
The error is:
GeForce 8800 GTX: :25: error: cannot codegen this l-value expression yet
int4 charImgVec = READ_IMAGE_I_2D(charImage , (int2) (300, 0) , charImgSz );
BTW the 25 is the line #
Can someone here see if this compiles for ATI, before I bother to reconfigure my system? Thanks!
#ifdef TRUE_IMAGES const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; #define READ_IMAGE_I_2D(image, coord, sz) read_imagei(image, sampler, coord) #define READ_IMAGE_F_2D(image, coord, sz) read_imagef(image, sampler, coord) #define READ_IMAGE_I_3D(image, coord, sz) read_imagei(image, sampler, coord) #define READ_IMAGE_F_3D(image, coord, sz) read_imagef(image, sampler, coord) #else #define READ_IMAGE_I_2D(image, coord, sz) convert_int4 (vload4((size_t) ((coord.s0 + (coord.s1 * sz.s0)) * 4), image) ) #define READ_IMAGE_F_2D(image, coord, sz) convert_float4(vload4((size_t) ((coord.s0 + (coord.s1 * sz.s0)) * 4), image) ) #define READ_IMAGE_I_3D(image, coord, sz) convert_int4 (vload4((size_t) ((coord.s0 + (coord.s1 * sz.s0) + (coord.s2 * sz.s1 * sz.s0)) * 4), image) ) #define READ_IMAGE_F_3D(image, coord, sz) convert_float4(vload4((size_t) ((coord.s0 + (coord.s1 * sz.s0) + (coord.s2 * sz.s1 * sz.s0)) * 4), image) ) #endif kernel void main( #ifdef TRUE_IMAGES __read_only image2d_t charImage , __read_only image2d_t intImage , __read_only image3d_t float3dImage #else global const char * charImage , global const int * intImage , global const float * float3dImage #endif , global float *output) { int2 charImgSz = (int2) (600, 1); int4 charImgVec = READ_IMAGE_I_2D(charImage , (int2) (300, 0) , charImgSz ); int2 intImgSz = (int2) (8192, 2); int4 intImgVec = READ_IMAGE_I_2D(intImage , (int2) (230, 1) , intImgSz ); int4 float3dImgSz = (int4) (2048, 376, 5, 0); float4 float3dImgVec = READ_IMAGE_F_3D(float3dImage, (int4) (17, 23, 3, 0), float3dImgSz); }
Well, I tried the above code on OSX. You never seen so many errors! I do not think these preprocessors are up to the job, or I am doing something wrong.
Any way, I went back to Cuda, and got less aggressive & it compiled, shown below. I have not actually run a kernel yet, because I will not actually implement this way.
I assemble my kernels on the fly which allows me to pass less parameters, store the source in the Java Class that makes it easiest to maintain, & obfuscate. I'll write Java functions, which gen the source inline. I'll probably gen using the # conditionals, so I do not have to know at assembly time whether a device supports images.
It would still help if someone could compile it on ATI. I am snowed in, so I need to shovel, & am probably done working today. Thanks!
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_NONE | CLK_FILTER_NEAREST; kernel void main( #ifdef TRUE_IMAGES __read_only image2d_t charImage , __read_only image2d_t intImage , __read_only image3d_t float3dImage #else global const char * charImage , global const int * intImage , global const float * float3dImage #endif , global float *output) { int2 charImgSz = (int2) (600, 1); int2 charImgCoord = (int2) (300, 0); #ifdef TRUE_IMAGES int4 charImgVec = read_imagei(charImage, sampler, charImgCoord); #else int4 charImgVec = convert_int4 (vload4((size_t) ((charImgCoord.s0 + (charImgCoord.s1 * charImgSz.s0)) * 4), charImage) ); #endif int2 intImgSz = (int2) (8192, 2); int2 intImgCoord = (int2) (230, 1); #ifdef TRUE_IMAGES int4 intImgVec = read_imagei(intImage, sampler, intImgCoord); #else int4 intImgVec = convert_int4 (vload4((size_t) ((intImgCoord.s0 + (intImgCoord.s1 * intImgSz.s0)) * 4), intImage) ); #endif int4 float3dImgSz = (int4) (2048, 376, 5, 0); int4 float3dCoord = (int4) ( 17, 23, 3, 0); #ifdef TRUE_IMAGES int4 float3dImgVec = read_imagei(float3dImage, sampler, intImgCoord); #else int4 float3dImgVec = vload4((size_t) ((float3dCoord.s0 + (float3dCoord.s1 * float3dImgSz.s0) + (float3dCoord.s2 * float3dImgSz.s1 * float3dImgSz.s0)) * 4), float3dImage); #endif }
Thanks! I am numb, but I am back.
I am looking at the results. At first I thought, in all the switching code around, and back and forth between emulation and true images, I left some invalid code on the true image side, and never tested that way again afterward. Then preprocessor made sure the compiler never even saw that code.
Sure enough, if I add this as the first line to turn images on:
#define TRUE_IMAGES
Micah,
But look at one of the 3 tests (one without that error). The error msg is:
390.cl(17): warning: variable "charImgSz" was declared but never referenced
int2 charImgSz = (int2) (600, 1);
^
The partial code is:
int2 charImgSz = (int2) (600, 1);
int2 charImgCoord = (int2) (300, 0);
#ifdef TRUE_IMAGES
int4 charImgVec = read_imagei(charImage, sampler, charImgCoord);
#else
int4 charImgVec = convert_int4 (vload4((size_t) ((charImgCoord.s0 + (charImgCoord.s1 * charImgSz.s0)) * 4), charImage) );
#endif
Now if there is no "#define TRUE_IMAGES" line, which there is not, then the
"#ifdef TRUE_IMAGES" should fail and the "#else" version should be submitted to the compiler. charImgSz is in the second version.
jcpalmer,
An easy way to test the kernel is to run it through Stream KernelAnalyzer(SKA). You can just copy and paste your kernel to the tool. As long as you have ATI Stream v2.01 installed (CPU mode is ok), it will work. It doesn't even require an ATI graphics card in your machine.