AnsweredAssumed Answered

OpenCL printf() limited to 65536 lines?

Question asked by mrbpix on Mar 30, 2017
Latest reply on Apr 5, 2017 by dipak

I spent hours reducing a bug in a complex OpenCL using printf() statements to debug to this very simple, very short program based on the HelloWorld sample that ships with the AMD APP SDK (14 lines of OpenCL, ~170 lines of C): https://pastebin.com/raw/8N4Ms2JT

 

On Linux: compile with:

$ g++ -I/opt/AMDAPPSDK-3.0/include  -o main.o -c main.cpp
$ g++ -o hello main.o -rdynamic -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -lOpenCL

The kernel launches 8192 threads and each thread prints 32 integers (with a dummy use of the input/output buffers to prevent the compiler from optimizing out the while loop):

__kernel void helloworld(__global uchar* in, __global uchar* out)
{
    uint        tid = get_global_id(0);
    uint        step = 32;
    uint        i = tid * step;
    uint        j = (tid + 1) * step;
    printf("%d: thread started\n", tid);
    while (i < j)
      {
        printf("integer %d\n", i);
        i++;
      }
    out[tid] = in[tid] + i;
}

However the output of some threads is missing (threads 0-1279 are missing):

$ ./hello | grep thread | sort -n | head
1280: thread started
1281: thread started
1282: thread started
1283: thread started
1284: thread started
...

If I change the local work size from 64 to 128 (see local_work_size variable) the output is correct:

$ ./hello | grep thread | sort -n | head
0: thread started
1: thread started
2: thread started
3: thread started
...

While counting the lines output by the kernel, I noticed it seems the root of the problem could be that the OpenCL printf() implementation is limited to printing at most 65536 lines (per kernel invocation?). The output would be different depending on the local work size because in one case the first work groups would be scheduled differently (eg. scheduled AFTER 65536 lines have been printed out) and this would be why their output is missing. In my original real-world program it seems that if a thread attempts to printf() a line after 65536 lines have already been printed out, then it silently dies/stops.


System details:
Ubuntu 16.04, 64-bit
AMDGPU-PRO driver 16.60.3
AMD APP SDK 3.0
Radeon RX 480

Outcomes