Hello everybody,
I just discovered that one of my kernels which ran fine with SDK 2.2 doesn't have its global const floats initialized with SDK 2.3:
I'm debugging the CPU version of my kernel and the debugger shows me that the value of my global "const float centerFactor = 0.75f * 0.75f;" variable declared inside the .cl-File evaluates to 0.0f. The results of the kernels calculations support this observation.
When i'm running the same kernel with SDK 2.2 the debugger shows that centerFactor evaluates to 0.5625 correctly and the calculations of the kernel are correct.
Has anybody observerd similar behaviour or can reproduce this?
If it's global and not scoped to a particular kernel, you should declare it as __constant float centerFactor.
Tried that - the problem persists.
Just to make it clear I attached some skeletons of what I had before and what I changed:
// Before: const float centerFactor = 0.75f * 0.75f; __kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { ... // centerFactor evaluates to 0.0f here ... } // Now: __constant float centerFactor = 0.75f * 0.75f; __kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { ... // centerFactor evaluates to 0.0f here ... }
Tried something else: In the following code the debugger shows the three float variables switching from some random numbers to 0.0f when the numbers should be assigned.
In some other kernels constants work like they should - same for SDK 2.2 - weird...
__kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { float centerFactor = 0.5625f; float manhattenFactor = 0.1875f; float diagonalFactor = 0.0625f; // All three floats evaluate to 0.0f here... }
xmainframex,
I tried to reproduce it on my end but failed.
My system config is :
SDK 2.3
Driver 10.12
CPU AMD Athlon II x4 630 processor 2.81GHz
GPU HD 5770 (Juniper)
Os Windows Vista Ultimate 64 bit
I suggest you to send a precise testcase a streamdeveloper@amd.com.
Please mention your System configuration also.
Unfortunately I'm not able to create a precise testcase: While trying to find the significant difference between the kernels with properly working constants and the ones which fail, I transformed the failing kernel completely into one that works: Exactly the same CL-Code and the same order of building the program, creating the kernel and setting the arguments.
However, nothing changed: One of the two exactly equal kernels runs properly, the other does not. The only difference is that one kernel is build and run before the other - the working kernel is the second one.
There are some more things I'd like to mention:
That's my system config:
SDK 2.3
Driver 10.11
CPU Intel Core Duo P8700 (2x2.53 GHz)
GPU Mobility HD 4670 (RV730)
OS Gentoo Linux
Maybe one of you can reproduce this problem now or make more qualified guesses about how to solve it than I can...
Please post the kernel you mentioned above.(which does not work the first time but works the second time.)
Can you try it on some newer GPU boards? I will try to reproduce on the same family GPU you are using currently.
@himanshu.gautam: Below is the kernel you requested together with the code I used for building and running it. It's (as its name says) for upscaling float greyscale pictures, so I guess it is clear what the buffers should contain and how big they should be.
Due to the fact that the three const float factors evaluate to 0.0f, the dst image is black after running the kernel - no matter what the src image contained.
Unfortunately I don't have other AMD/ATI GPU boards available. I guess it is easier to narrow down the problem on CPU anyway, isn't it?
/// Kernel code: /////////////////////// #ifdef __DEBUG #pragma OPENCL EXTENSION cl_amd_printf : enable #endif const float centerFactor = 0.75f * 0.75f; // Debugger shows that this is 0.0f on CPU const float manhattenFactor = 0.75f * 0.25f; // Debugger shows that this is 0.0f on CPU const float diagonalFactor = 0.25f * 0.25f; // Debugger shows that this is 0.0f on CPU __kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { int srcPosX = get_global_id(0); int srcPosY = get_global_id(1); if(srcPosX >= srcWidth) return; if(srcPosY >= srcHeight) return; int srcOffsXMin = max(0, srcPosX - 1); int srcOffsXMax = min(srcWidth - 1, srcPosX + 1); int srcOffsYMin = max(0, (srcPosY-1) * srcWidth); int srcOffsYMed = srcPosY * srcWidth; int srcOffsYMax = min((srcHeight-1) * srcWidth, (srcPosY+1) * srcWidth); float v11 = src[srcOffsYMin + srcOffsXMin]; float v12 = src[srcOffsYMin + srcPosX]; float v13 = src[srcOffsYMin + srcOffsXMax]; float v21 = src[srcOffsYMed + srcOffsXMin]; float v22 = src[srcOffsYMed + srcPosX]; float v23 = src[srcOffsYMed + srcOffsXMax]; float v31 = src[srcOffsYMax + srcOffsXMin]; float v32 = src[srcOffsYMax + srcPosX]; float v33 = src[srcOffsYMax + srcOffsXMax]; dst += 2 * srcPosY * dstWidth + 2 * srcPosX; *dst = diagonalFactor * v11 + manhattenFactor * v12 + manhattenFactor * v21 + centerFactor * v22; dst++; *dst = manhattenFactor * v12 + diagonalFactor * v13 + centerFactor * v22 + manhattenFactor * v23; dst += dstWidth - 1; *dst = manhattenFactor * v21 + centerFactor * v22 + diagonalFactor * v31 + manhattenFactor * v32; dst++; *dst = centerFactor * v22 + manhattenFactor * v23 + manhattenFactor * v32 + diagonalFactor * v33; } /// Build/run host code: /////////////////////// // Create program and kernel std::ifstream file((KernelPath + "ImageOps/ScaleUp2.cl").c_str()); fail(file.is_open(), "OpenClSift: ImageOps/ScaleUp2.cl not found"); std::string prog((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>()); cl::Program::Sources source(1, std::make_pair(prog.c_str(), prog.length() + 1)); pScaleUp = new cl::Program(env->Context, source); buildProgram(pScaleUp, env->Devices); kScaleUp = new cl::Kernel(*pScaleUp, "ScaleUp2"); kScaleUp->setArg(0, *dst); kScaleUp->setArg(1, *src); kScaleUp->setArg(2, srcWidth); kScaleUp->setArg(3, srcHeight); kScaleUp->setArg(4, 2 * srcWidth); cl::Event done; env->Queue.enqueueNDRangeKernel(*kScaleUp, cl::NullRange, CL_SUFFICE2D(cl::NDRange(srcWidth, srcHeight)), WorkgroupSize, NULL, &done); done.wait(); // dst buffer is completely 0.0f if I download it here...
What is this "debugger" you mentioned? I thought only Nvidia had a GPU debugger and then only for CUDA applications...
@rick.weber: As I mentioned, the problem appears both on CPU and GPU. I could only debug on CPU and used gdb for that - created breakpoints at __OpenCL_KernelName_kernel functions after the kernel was created.
Due to the fact that the results of the kernel are equal (and wrong) on CPU and GPU, I expect the GPU kernel to suffer form the same problem.
@MicahVillmow: It all seems to become more and more inconsistent to me. What would be the expected behaviour when I try to run this code? According to some of the replies I'd guess that there should be a compiler error or so, but I observed silent errors or everything running fine so far (post from 01/14/2011 07:26 AM).
Btw., as I mentioned above declaring the floats inside of ScaleUp2 doesn't change anything - so how do all those observations fit together...!?
There was a warning that the compiler changed the adress space to constant - so the code should have been legal afterwards...
I changed my algorithm so that I am no more dependend on the failing kernel. All other kernels work well with SDK 2.3 and the results are as expected now - no idea about what else to try to fix the failing kernel...
Micah,
Does const actually change the address space to the constant space (e.g. const is a synonym for __constant) or does it just tell the compiler the variable doesn't actually chage after initialization?
const float centerFactor = 0.75f * 0.75f; // <-- this is illegal OpenCL code. You cannot have globally scoped variables in the private address space.
And what happens if I do this?
__kernel void MyKernel ( const __global float *src, __global float *dst, const float k ) { const size_t gid = get_global_id(0); dst[gid] = src[gid] + k; }
@bubu: Not as compact as the original code but your code snipped works as expected...
Originally posted by: XmainframeX @bubu: Not as compact as the original code but your code snipped works as expected...
Good!