Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

Valgrind reports error for get_global_id(0)

Hi all,

I wrote an OpenCL code that I'm running on either a CPU (for development) and on a GPU.

To debug a problem, I ran it through valgrind and it detected errors in AMD Stream.

I created a test case that I've uploaded as a github's gist:

Basically, I get these errors:

Execute OpenCL kernel...
==14001== Thread 3:
==14001== Use of uninitialised value of size 8
==14001==    at 0x64B0FBF: ??? (in /opt/amdstream/lib/x86_64/
==14001== Use of uninitialised value of size 8
==14001==    at 0xCC50355: __OpenCL_Test_OpenCL_kernel (
==14001==    by 0x645EC90: ??? (in /opt/amdstream/lib/x86_64/
==14001==    by 0x645F492: ??? (in /opt/amdstream/lib/x86_64/
==14001==    by 0x64B308B: ??? (in /opt/amdstream/lib/x86_64/
==14001==    by 0x64B119C: ??? (in /opt/amdstream/lib/x86_64/
==14001==    by 0x5B36D3F: start_thread (in /lib/
==14001==    by 0x58A2AEC: clone (in /lib/

line 8 of the file is the printf:

    unsigned int i = get_global_id(0);
    printf("i = %3d  ", i);


Could that be a bug in AMD Stream? I'm using v2.4 on ArchLinux.


9 Replies
Journeyman III

check out kernal analyer


i dont think you can do a printf in the

i get errors if i try anyway,



Take what valgrind says with a grain of salt in OpenCL applications; there's alot of driver interaction that confuses it. Also, your printf string is technically incorrect; you should use %u

haya- You can use printf on the CPU from kernels. See appendix A 8.6 in the AMD APP Guide.

0 Likes is correct. Valgrind generates a lot of false errors with opencl kernel running on GPUs. Do you other wise face any problem with the kernel?


printf can be used for both cpu & gpu.See section A.8 of Programming guide.


Thanks all for the comments.

@hayabusaxps: I'll check the kernel analyser. But since I'm on linux, it migth take some time until I can try it. Also, printf() do work if you enable amd's extension (#pragma OPENCL EXTENSION cl_amd_printf : enable), at least when running on the CPU. Good catch for the unsigned int The link you gave explains why valgrind can report memory leaks. Memory is allocated and then passed to the driver. This is what (might) confuses valgrind as it thinks it was not free-d while the driver did free it. But bad access and use of uninitilized values are different things.

@himanshu.gautam: I understand that valgrind can be confused when running on the GPU, but the OpenCL device I'm using is a CPU. The kernel is dead simple (you can look it up for yourself, at the link provided in the original post). It's a simple addition of two vectors. It's 13 lines only.


After debugging my own code, I realise there might be a memory leak in amd stream. Again, I tried reproducing it with the simple test code (posted here: To measure the memory used by the process, I use the following:

while [ 1 ]; do pidof -x opencl_test | xargs ps u -p | tail -1 | awk '{print ""$6" KiB    "$6/1024" MiB    "$6/1024**2" GiB"}' ; sleep 0 ;done 2> /dev/null

Basically, I use "ps u -p" to extract the memory usage of the process, in a loop to see the evolution. The posted code generated around 55MB for 100000 kernel launch. Replacing the "for" loop with a "while (true)", memory increases indefinitely. Can anybody reproduce that?



I can also see a memory leak in running my real program. After ~230 000 time steps, it gets killed because it uses more then 512MB when it started with only less then 1MB. I thus cannot use OpenCL on my CPU for now...



I do not see any memory leak on my system.System Config : Vista64, SDK 2.4 and 11.4 driver, GPU juniper,CPU AMD Athlon II X4.

Please specify your system configuration:CPU,GPU,SDK,Driver,OS .

Also check if you get memory leak while running the samples.


My system configuration is:

ArchLinux x86_64, kernel v2.6.38.4, AMD Stream 2.4, running on an Intel Core2 Quad CPU Q6600 (no GPU).

I ran the NBody code example (AMD-APP-SDK-v2.4-lnx64/samples/opencl/cl/app/NBody) and did not see a leak.


Generally systems having atleast one AMD product are supported by AMD APP SDK. Although as NBOdy sample doesn't show any leak it might something in your code.


The code is quite simple. See line 225 of main.cpp. There is a loop launching kernels (waiting for them to finish).

The only different with the samples was the "event" variable (last argument to clEnqueueNDRangeKernel). I changed it from "&event" to "NULL" and the leak disappeared.