cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

XmainframeX
Journeyman III

const floats in kernel not initialized

Hello everybody,

I just discovered that one of my kernels which ran fine with SDK 2.2 doesn't have its global const floats initialized with SDK 2.3:

I'm debugging the CPU version of my kernel and the debugger shows me that the value of my global "const float centerFactor = 0.75f * 0.75f;" variable declared inside the .cl-File evaluates to 0.0f. The results of the kernels calculations support this observation.

When i'm running the same kernel with SDK 2.2 the debugger shows that centerFactor evaluates to 0.5625 correctly and the calculations of the kernel are correct.

 

Has anybody observerd similar behaviour or can reproduce this?

0 Likes
18 Replies
rick_weber
Adept II

If it's global and not scoped to a particular kernel, you should declare it as __constant float centerFactor.

0 Likes

Tried that - the problem persists.

Just to make it clear I attached some skeletons of what I had before and what I changed:

// Before: const float centerFactor = 0.75f * 0.75f; __kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { ... // centerFactor evaluates to 0.0f here ... } // Now: __constant float centerFactor = 0.75f * 0.75f; __kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { ... // centerFactor evaluates to 0.0f here ... }

0 Likes

Tried something else: In the following code the debugger shows the three float variables switching from some random numbers to 0.0f when the numbers should be assigned.

In some other kernels constants work like they should - same for SDK 2.2 - weird...

__kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { float centerFactor = 0.5625f; float manhattenFactor = 0.1875f; float diagonalFactor = 0.0625f; // All three floats evaluate to 0.0f here... }

0 Likes

xmainframex,

I tried to reproduce it on my end but failed.

My system config is :
SDK 2.3
Driver 10.12
CPU AMD Athlon II x4 630 processor 2.81GHz
GPU HD 5770 (Juniper)
Os Windows Vista Ultimate 64 bit

I suggest you to send a precise testcase a streamdeveloper@amd.com.

Please mention your System configuration also.

0 Likes

Unfortunately I'm not able to create a precise testcase: While trying to find the significant difference between the kernels with properly working constants and the ones which fail, I transformed the failing kernel completely into one that works: Exactly the same CL-Code and the same order of building the program, creating the kernel and setting the arguments.

However, nothing changed: One of the two exactly equal kernels runs properly, the other does not. The only difference is that one kernel is build and run before the other - the working kernel is the second one.

 

There are some more things I'd like to mention:

  • Evaluating to the wrong value also happens for "written" constants, i.e. if I have a "__global float* dst" and I do "*dst = 0.5f;", the target of dst evaluates to 0.0f as well.
  • It seems like not all floats evaluate to 0.0f: They are always rounded down to the next lower integer: 1.5f evaluates to 1 for example.
  • The problem appears equally on GPU and CPU.

 

That's my system config:

SDK 2.3
Driver 10.11
CPU Intel Core Duo P8700 (2x2.53 GHz)
GPU Mobility HD 4670 (RV730)
OS Gentoo Linux

 

Maybe one of you can reproduce this problem now or make more qualified guesses about how to solve it than I can...

0 Likes

Please post the kernel you mentioned above.(which does not work the first time but works the second time.)

Can you try it on some newer GPU boards? I  will try to reproduce on the same family GPU you are using currently. 

0 Likes

@himanshu.gautam: Below is the kernel you requested together with the code I used for building and running it. It's (as its name says) for upscaling float greyscale pictures, so I guess it is clear what the buffers should contain and how big they should be.

Due to the fact that the three const float factors evaluate to 0.0f, the dst image is black after running the kernel - no matter what the src image contained.

 

Unfortunately I don't have other AMD/ATI GPU boards available. I guess it is easier to narrow down the problem on CPU anyway, isn't it?

 

/// Kernel code: /////////////////////// #ifdef __DEBUG #pragma OPENCL EXTENSION cl_amd_printf : enable #endif const float centerFactor = 0.75f * 0.75f; // Debugger shows that this is 0.0f on CPU const float manhattenFactor = 0.75f * 0.25f; // Debugger shows that this is 0.0f on CPU const float diagonalFactor = 0.25f * 0.25f; // Debugger shows that this is 0.0f on CPU __kernel void ScaleUp2(__global float* dst, __global float* src, int srcWidth, int srcHeight, int dstWidth) { int srcPosX = get_global_id(0); int srcPosY = get_global_id(1); if(srcPosX >= srcWidth) return; if(srcPosY >= srcHeight) return; int srcOffsXMin = max(0, srcPosX - 1); int srcOffsXMax = min(srcWidth - 1, srcPosX + 1); int srcOffsYMin = max(0, (srcPosY-1) * srcWidth); int srcOffsYMed = srcPosY * srcWidth; int srcOffsYMax = min((srcHeight-1) * srcWidth, (srcPosY+1) * srcWidth); float v11 = src[srcOffsYMin + srcOffsXMin]; float v12 = src[srcOffsYMin + srcPosX]; float v13 = src[srcOffsYMin + srcOffsXMax]; float v21 = src[srcOffsYMed + srcOffsXMin]; float v22 = src[srcOffsYMed + srcPosX]; float v23 = src[srcOffsYMed + srcOffsXMax]; float v31 = src[srcOffsYMax + srcOffsXMin]; float v32 = src[srcOffsYMax + srcPosX]; float v33 = src[srcOffsYMax + srcOffsXMax]; dst += 2 * srcPosY * dstWidth + 2 * srcPosX; *dst = diagonalFactor * v11 + manhattenFactor * v12 + manhattenFactor * v21 + centerFactor * v22; dst++; *dst = manhattenFactor * v12 + diagonalFactor * v13 + centerFactor * v22 + manhattenFactor * v23; dst += dstWidth - 1; *dst = manhattenFactor * v21 + centerFactor * v22 + diagonalFactor * v31 + manhattenFactor * v32; dst++; *dst = centerFactor * v22 + manhattenFactor * v23 + manhattenFactor * v32 + diagonalFactor * v33; } /// Build/run host code: /////////////////////// // Create program and kernel std::ifstream file((KernelPath + "ImageOps/ScaleUp2.cl").c_str()); fail(file.is_open(), "OpenClSift: ImageOps/ScaleUp2.cl not found"); std::string prog((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>()); cl::Program::Sources source(1, std::make_pair(prog.c_str(), prog.length() + 1)); pScaleUp = new cl::Program(env->Context, source); buildProgram(pScaleUp, env->Devices); kScaleUp = new cl::Kernel(*pScaleUp, "ScaleUp2"); kScaleUp->setArg(0, *dst); kScaleUp->setArg(1, *src); kScaleUp->setArg(2, srcWidth); kScaleUp->setArg(3, srcHeight); kScaleUp->setArg(4, 2 * srcWidth); cl::Event done; env->Queue.enqueueNDRangeKernel(*kScaleUp, cl::NullRange, CL_SUFFICE2D(cl::NDRange(srcWidth, srcHeight)), WorkgroupSize, NULL, &done); done.wait(); // dst buffer is completely 0.0f if I download it here...

0 Likes

What is this "debugger" you mentioned? I thought only Nvidia had a GPU debugger and then only for CUDA applications...

0 Likes

@rick.weber: As I mentioned, the problem appears both on CPU and GPU. I could only debug on CPU and used gdb for that - created breakpoints at __OpenCL_KernelName_kernel functions after the kernel was created.

Due to the fact that the results of the kernel are equal (and wrong) on CPU and GPU, I expect the GPU kernel to suffer form the same problem.

0 Likes

const float centerFactor = 0.75f * 0.75f; // Debugger shows that this is 0.0f on CPU <-- this is illegal OpenCL code. You cannot have globally scoped variables in the private address space.
0 Likes

@MicahVillmow: It all seems to become more and more inconsistent to me. What would be the expected behaviour when I try to run this code? According to some of the replies I'd guess that there should be a compiler error or so, but I observed silent errors or everything running fine so far (post from 01/14/2011 07:26 AM).

Btw., as I mentioned above declaring the floats inside of ScaleUp2 doesn't change anything - so how do all those observations fit together...!?

0 Likes

I would expect it to error in the compiler. Since it is illegal code and the compiler doesn't catch it, the results are undefined and cannot be relied upon.
0 Likes

There was a warning that the compiler changed the adress space to constant - so the code should have been legal afterwards...

I changed my algorithm so that I am no more dependend on the failing kernel. All other kernels work well with SDK 2.3 and the results are as expected now - no idea about what else to try to fix the failing kernel...

0 Likes

Micah,

Does const actually change the address space to the constant space (e.g. const is a synonym for __constant) or does it just tell the compiler the variable doesn't actually chage after initialization?

0 Likes

I talked with our frontend engineer and the compiler will change the address space to constant. const itself is not a synonym for constant, but this was a common enough mistake that instead of emitting a compiler error, we emit a compiler warning and then make the change.
0 Likes

const float centerFactor = 0.75f * 0.75f; // <-- this is illegal OpenCL code. You cannot have globally scoped variables in the private address space.

And what happens if I do this?

 

 

__kernel void MyKernel ( const __global float *src, __global float *dst, const float k ) { const size_t gid = get_global_id(0); dst[gid] = src[gid] + k; }

0 Likes

@bubu: Not as compact as the original code but your code snipped works as expected...

0 Likes

Originally posted by: XmainframeX @bubu: Not as compact as the original code but your code snipped works as expected...

 

Good!

0 Likes