cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

mrbpix
Journeyman III

OpenCL printf() limited to 65536 lines?

I spent hours reducing a bug in a complex OpenCL using printf() statements to debug to this very simple, very short program based on the HelloWorld sample that ships with the AMD APP SDK (14 lines of OpenCL, ~170 lines of C): https://pastebin.com/raw/8N4Ms2JT

On Linux: compile with:

$ g++ -I/opt/AMDAPPSDK-3.0/include  -o main.o -c main.cpp
$ g++ -o hello main.o -rdynamic -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -lOpenCL

The kernel launches 8192 threads and each thread prints 32 integers (with a dummy use of the input/output buffers to prevent the compiler from optimizing out the while loop):

__kernel void helloworld(__global uchar* in, __global uchar* out)
{
    uint        tid = get_global_id(0);
    uint        step = 32;
    uint        i = tid * step;
    uint        j = (tid + 1) * step;
    printf("%d: thread started\n", tid);
    while (i < j)
      {
        printf("integer %d\n", i);
        i++;
      }
    out[tid] = in[tid] + i;
}

However the output of some threads is missing (threads 0-1279 are missing):

$ ./hello | grep thread | sort -n | head
1280: thread started
1281: thread started
1282: thread started
1283: thread started
1284: thread started
...

If I change the local work size from 64 to 128 (see local_work_size variable) the output is correct:

$ ./hello | grep thread | sort -n | head
0: thread started
1: thread started
2: thread started
3: thread started
...

While counting the lines output by the kernel, I noticed it seems the root of the problem could be that the OpenCL printf() implementation is limited to printing at most 65536 lines (per kernel invocation?). The output would be different depending on the local work size because in one case the first work groups would be scheduled differently (eg. scheduled AFTER 65536 lines have been printed out) and this would be why their output is missing. In my original real-world program it seems that if a thread attempts to printf() a line after 65536 lines have already been printed out, then it silently dies/stops.


System details:
Ubuntu 16.04, 64-bit
AMDGPU-PRO driver 16.60.3
AMD APP SDK 3.0
Radeon RX 480

0 Likes
1 Solution
dipak
Big Boss

Hi Marc,

There is a size limit for printf buffer. The actual buffer size is implementation dependent and may vary device to device. As per OpenCL 1.2 spec, the minimum size is 1MB (for FULL profile). You can find out the size by calling clGetDeviceInfo  with parameter CL_DEVICE_PRINTF_BUFFER_SIZE. Another point is, depending on the implementation, this printf buffer might be shared with other kernels too (if running any), thus may reduce the effective size for any particular kernel.

Regarding the order of printf output, there is no guarantee and it can differ run to run even for same kernel settings. That's why sometimes it is hard to debug using printf if many threads are running and generating lots of outputs.

Regards,

View solution in original post

0 Likes
7 Replies
dipak
Big Boss

Hi Marc,

There is a size limit for printf buffer. The actual buffer size is implementation dependent and may vary device to device. As per OpenCL 1.2 spec, the minimum size is 1MB (for FULL profile). You can find out the size by calling clGetDeviceInfo  with parameter CL_DEVICE_PRINTF_BUFFER_SIZE. Another point is, depending on the implementation, this printf buffer might be shared with other kernels too (if running any), thus may reduce the effective size for any particular kernel.

Regarding the order of printf output, there is no guarantee and it can differ run to run even for same kernel settings. That's why sometimes it is hard to debug using printf if many threads are running and generating lots of outputs.

Regards,

0 Likes

Thank you, I somehow missed this 1MB limit when reading the docs. I didn't mention it but the test program linked in my first post actually segfaults. Is this normal behavior when exceeding 1MB printed?

0 Likes

It should not generate a sefault for exceeding the buffer limit. The printf outputs might be ignored i.e. not get printed.

Regards,

0 Likes

Well I hereby submit a bug report then The code I attached is pretty much the HelloWorld sample from the AMD APP SDK and it segfaults.

0 Likes

Thanks for reporting it and sharing the repro. May I assume below points regarding the attached code?

  1. no segfault if printf statements are removed from the above kernel code.
  2. no segfault if printf statements are there but less number of threads are launched thus producing less printf output.

Please confirm.

0 Likes

1. Yes.

2. Yes (no segfault with 1536 threads, segfaults with 2048 threads.)

0 Likes

Thanks for the confirmation. I've already opened a ticket against it.

Regards,

0 Likes