4 Replies Latest reply on Aug 21, 2014 8:21 PM by kphillisjr

    printf doesn't work for first group


      I'm running a kernel with the following code:


      printf("glid=%d,%d; grid=%d,%d; lid=%d,%d; lsize=%d,%d\n",
      get_global_id(0), get_global_id(1), get_group_id(0), get_group_id(1),
      get_local_id(0), get_local_id(1), get_local_size(0), get_local_size(1));


      When running with global work size 16x16 and local work size 8x8 I get the following output:


      Using platform vendor: Advanced Micro Devices, Inc.

      glid=8,0; grid=1,0;     lid=0,0; lsize=8,8

      glid=9,0; grid=1,0;     lid=1,0; lsize=8,8

      glid=10,0; grid=1,0;    lid=2,0; lsize=8,8

      glid=11,0; grid=1,0;    lid=3,0; lsize=8,8

      glid=12,0; grid=1,0;    lid=4,0; lsize=8,8

      glid=13,0; grid=1,0;    lid=5,0; lsize=8,8

      glid=14,0; grid=1,0;    lid=6,0; lsize=8,8

      glid=15,0; grid=1,0;    lid=7,0; lsize=8,8

      glid=8,1; grid=1,0;     lid=0,1; lsize=8,8

      glid=9,1; grid=1,0;     lid=1,1; lsize=8,8

      glid=10,1; grid=1,0;    lid=2,1; lsize=8,8

      glid=11,1; grid=1,0;    lid=3,1; lsize=8,8

      glid=12,1; grid=1,0;    lid=4,1; lsize=8,8

      glid=13,1; grid=1,0;    lid=5,1; lsize=8,8

      glid=14,1; grid=1,0;    lid=6,1; lsize=8,8

      glid=15,1; grid=1,0;    lid=7,1; lsize=8,8

      glid=8,2; grid=1,0;     lid=0,2; lsize=8,8

      glid=9,2; grid=1,0;     lid=1,2; lsize=8,8

      glid=10,2; grid=1,0;    lid=2,2; lsize=8,8

      glid=11,2; grid=1,0;    lid=3,2; lsize=8,8

      glid=12,2; grid=1,0;    lid=4,2; lsize=8,8

      glid=13,2; grid=1,0;    lid=5,2; lsize=8,8

      glid=14,2; grid=1,0;    lid=6,2; lsize=8,8

      glid=15,2; grid=1,0;    lid=7,2; lsize=8,8

      glid=8,3; grid=1,0;     lid=0,3; lsize=8,8

      glid=9,3; grid=1,0;     lid=1,3; lsize=8,8

      glid=10,3; grid=1,0;    lid=2,3; lsize=8,8

      glid=11,3; grid=1,0;    lid=3,3; lsize=8,8

      glid=12,3; grid=1,0;    lid=4,3; lsize=8,8

      glid=13,3; grid=1,0;    lid=5,3; lsize=8,8

      glid=14,3; grid=1,0;    lid=6,3; lsize=8,8

      glid=15,3; grid=1,0;    lid=7,3; lsize=8,8

      glid=8,4; grid=1,0;     lid=0,4; lsize=8,8

      glid=9,4; grid=1,0;     lid=1,4; lsize=8,8

      glid=10,4; grid=1,0;    lid=2,4; lsize=8,8

      glid=11,4; grid=1,0;    lid=3,4; lsize=8,8

      glid=12,4; grid=1,0;    lid=4,4; lsize=8,8

      glid=13,4; grid=1,0;    lid=5,4; lsize=8,8

      glid=14,4; grid=1,0;    lid=6,4; lsize=8,8

      glid=15,4; grid=1,0;    lid=7,4; lsize=8,8

      glid=8,5; grid=1,0;     lid=0,5; lsize=8,8

      glid=9,5; grid=1,0;     lid=1,5; lsize=8,8

      glid=10,5; grid=1,0;    lid=2,5; lsize=8,8

      glid=11,5; grid=1,0;    lid=3,5; lsize=8,8

      glid=12,5; grid=1,0;    lid=4,5; lsize=8,8

      glid=13,5; grid=1,0;    lid=5,5; lsize=8,8

      glid=14,5; grid=1,0;    lid=6,5; lsize=8,8

      glid=15,5; grid=1,0;    lid=7,5; lsize=8,8

      glid=8,6; grid=1,0;     lid=0,6; lsize=8,8

      glid=9,6; grid=1,0;     lid=1,6; lsize=8,8

      glid=10,6; grid=1,0;    lid=2,6; lsize=8,8

      glid=11,6; grid=1,0;    lid=3,6; lsize=8,8

      glid=12,6; grid=1,0;    lid=4,6; lsize=8,8

      glid=13,6; grid=1,0;    lid=5,6; lsize=8,8

      glid=14,6; grid=1,0;    lid=6,6; lsize=8,8

      glid=15,6; grid=1,0;    lid=7,6; lsize=8,8

      glid=8,7; grid=1,0;     lid=0,7; lsize=8,8

      glid=9,7; grid=1,0;     lid=1,7; lsize=8,8

      glid=10,7; grid=1,0;    lid=2,7; lsize=8,8

      glid=11,7; grid=1,0;    lid=3,7; lsize=8,8

      glid=12,7; grid=1,0;    lid=4,7; lsize=8,8

      glid=13,7; grid=1,0;    lid=5,7; lsize=8,8

      glid=14,7; grid=1,0;    lid=6,7; lsize=8,8

      glid=15,7; grid=1,0;    lid=7,7; lsize=8,8

      glid=0,8; grid=0,1;     lid=0,0; lsize=8,8

      glid=1,8; grid=0,1;     lid=1,0; lsize=8,8

      glid=2,8; grid=0,1;     lid=2,0; lsize=8,8

      glid=3,8; grid=0,1;     lid=3,0; lsize=8,8

      glid=4,8; grid=0,1;     lid=4,0; lsize=8,8

      glid=5,8; grid=0,1;     lid=5,0; lsize=8,8

      glid=6,8; grid=0,1;     lid=6,0; lsize=8,8

      glid=7,8; grid=0,1;     lid=7,0; lsize=8,8

      glid=0,9; grid=0,1;     lid=0,1; lsize=8,8

      glid=1,9; grid=0,1;     lid=1,1; lsize=8,8

      glid=2,9; grid=0,1;     lid=2,1; lsize=8,8

      glid=3,9; grid=0,1;     lid=3,1; lsize=8,8

      glid=4,9; grid=0,1;     lid=4,1; lsize=8,8

      glid=5,9; grid=0,1;     lid=5,1; lsize=8,8

      glid=6,9; grid=0,1;     lid=6,1; lsize=8,8

      glid=7,9; grid=0,1;     lid=7,1; lsize=8,8

      glid=0,10; grid=0,1;    lid=0,2; lsize=8,8

      glid=1,10; grid=0,1;    lid=1,2; lsize=8,8

      glid=2,10; grid=0,1;    lid=2,2; lsize=8,8

      glid=3,10; grid=0,1;    lid=3,2; lsize=8,8

      glid=4,10; grid=0,1;    lid=4,2; lsize=8,8

      glid=5,10; grid=0,1;    lid=5,2; lsize=8,8

      glid=6,10; grid=0,1;    lid=6,2; lsize=8,8

      glid=7,10; grid=0,1;    lid=7,2; lsize=8,8

      glid=0,11; grid=0,1;    lid=0,3; lsize=8,8

      glid=1,11; grid=0,1;    lid=1,3; lsize=8,8

      glid=2,11; grid=0,1;    lid=2,3; lsize=8,8

      glid=3,11; grid=0,1;    lid=3,3; lsize=8,8

      glid=4,11; grid=0,1;    lid=4,3; lsize=8,8

      glid=5,11; grid=0,1;    lid=5,3; lsize=8,8

      glid=6,11; grid=0,1;    lid=6,3; lsize=8,8

      glid=7,11; grid=0,1;    lid=7,3; lsize=8,8

      glid=0,12; grid=0,1;    lid=0,4; lsize=8,8

      glid=1,12; grid=0,1;    lid=1,4; lsize=8,8

      glid=2,12; grid=0,1;    lid=2,4; lsize=8,8

      glid=3,12; grid=0,1;    lid=3,4; lsize=8,8

      glid=4,12; grid=0,1;    lid=4,4; lsize=8,8

      glid=5,12; grid=0,1;    lid=5,4; lsize=8,8

      glid=6,12; grid=0,1;    lid=6,4; lsize=8,8

      glid=7,12; grid=0,1;    lid=7,4; lsize=8,8

      glid=0,13; grid=0,1;    lid=0,5; lsize=8,8

      glid=1,13; grid=0,1;    lid=1,5; lsize=8,8

      glid=2,13; grid=0,1;    lid=2,5; lsize=8,8

      glid=3,13; grid=0,1;    lid=3,5; lsize=8,8

      glid=4,13; grid=0,1;    lid=4,5; lsize=8,8

      glid=5,13; grid=0,1;    lid=5,5; lsize=8,8

      glid=6,13; grid=0,1;    lid=6,5; lsize=8,8

      glid=7,13; grid=0,1;    lid=7,5; lsize=8,8

      glid=0,14; grid=0,1;    lid=0,6; lsize=8,8

      glid=1,14; grid=0,1;    lid=1,6; lsize=8,8

      glid=2,14; grid=0,1;    lid=2,6; lsize=8,8

      glid=3,14; grid=0,1;    lid=3,6; lsize=8,8

      glid=4,14; grid=0,1;    lid=4,6; lsize=8,8

      glid=5,14; grid=0,1;    lid=5,6; lsize=8,8

      glid=6,14; grid=0,1;    lid=6,6; lsize=8,8

      glid=7,14; grid=0,1;    lid=7,6; lsize=8,8

      glid=0,15; grid=0,1;    lid=0,7; lsize=8,8

      glid=1,15; grid=0,1;    lid=1,7; lsize=8,8

      glid=2,15; grid=0,1;    lid=2,7; lsize=8,8

      glid=3,15; grid=0,1;    lid=3,7; lsize=8,8

      glid=4,15; grid=0,1;    lid=4,7; lsize=8,8

      glid=5,15; grid=0,1;    lid=5,7; lsize=8,8

      glid=6,15; grid=0,1;    lid=6,7; lsize=8,8

      glid=7,15; grid=0,1;    lid=7,7; lsize=8,8

      glid=8,8; grid=1,1;     lid=0,0; lsize=8,8

      glid=9,8; grid=1,1;     lid=1,0; lsize=8,8

      glid=10,8; grid=1,1;    lid=2,0; lsize=8,8

      glid=11,8; grid=1,1;    lid=3,0; lsize=8,8

      glid=12,8; grid=1,1;    lid=4,0; lsize=8,8

      glid=13,8; grid=1,1;    lid=5,0; lsize=8,8

      glid=14,8; grid=1,1;    lid=6,0; lsize=8,8

      glid=15,8; grid=1,1;    lid=7,0; lsize=8,8

      glid=8,9; grid=1,1;     lid=0,1; lsize=8,8

      glid=9,9; grid=1,1;     lid=1,1; lsize=8,8

      glid=10,9; grid=1,1;    lid=2,1; lsize=8,8

      glid=11,9; grid=1,1;    lid=3,1; lsize=8,8

      glid=12,9; grid=1,1;    lid=4,1; lsize=8,8

      glid=13,9; grid=1,1;    lid=5,1; lsize=8,8

      glid=14,9; grid=1,1;    lid=6,1; lsize=8,8

      glid=15,9; grid=1,1;    lid=7,1; lsize=8,8

      glid=8,10; grid=1,1;    lid=0,2; lsize=8,8

      glid=9,10; grid=1,1;    lid=1,2; lsize=8,8

      glid=10,10; grid=1,1;   lid=2,2; lsize=8,8

      glid=11,10; grid=1,1;   lid=3,2; lsize=8,8

      glid=12,10; grid=1,1;   lid=4,2; lsize=8,8

      glid=13,10; grid=1,1;   lid=5,2; lsize=8,8

      glid=14,10; grid=1,1;   lid=6,2; lsize=8,8

      glid=15,10; grid=1,1;   lid=7,2; lsize=8,8

      glid=8,11; grid=1,1;    lid=0,3; lsize=8,8

      glid=9,11; grid=1,1;    lid=1,3; lsize=8,8

      glid=10,11; grid=1,1;   lid=2,3; lsize=8,8

      glid=11,11; grid=1,1;   lid=3,3; lsize=8,8

      glid=12,11; grid=1,1;   lid=4,3; lsize=8,8

      glid=13,11; grid=1,1;   lid=5,3; lsize=8,8

      glid=14,11; grid=1,1;   lid=6,3; lsize=8,8

      glid=15,11; grid=1,1;   lid=7,3; lsize=8,8

      glid=8,12; grid=1,1;    lid=0,4; lsize=8,8

      glid=9,12; grid=1,1;    lid=1,4; lsize=8,8

      glid=10,12; grid=1,1;   lid=2,4; lsize=8,8

      glid=11,12; grid=1,1;   lid=3,4; lsize=8,8

      glid=12,12; grid=1,1;   lid=4,4; lsize=8,8

      glid=13,12; grid=1,1;   lid=5,4; lsize=8,8

      glid=14,12; grid=1,1;   lid=6,4; lsize=8,8

      glid=15,12; grid=1,1;   lid=7,4; lsize=8,8

      glid=8,13; grid=1,1;    lid=0,5; lsize=8,8

      glid=9,13; grid=1,1;    lid=1,5; lsize=8,8

      glid=10,13; grid=1,1;   lid=2,5; lsize=8,8

      glid=11,13; grid=1,1;   lid=3,5; lsize=8,8

      glid=12,13; grid=1,1;   lid=4,5; lsize=8,8

      glid=13,13; grid=1,1;   lid=5,5; lsize=8,8

      glid=14,13; grid=1,1;   lid=6,5; lsize=8,8

      glid=15,13; grid=1,1;   lid=7,5; lsize=8,8

      glid=8,14; grid=1,1;    lid=0,6; lsize=8,8

      glid=9,14; grid=1,1;    lid=1,6; lsize=8,8

      glid=10,14; grid=1,1;   lid=2,6; lsize=8,8

      glid=11,14; grid=1,1;   lid=3,6; lsize=8,8

      glid=12,14; grid=1,1;   lid=4,6; lsize=8,8

      glid=13,14; grid=1,1;   lid=5,6; lsize=8,8

      glid=14,14; grid=1,1;   lid=6,6; lsize=8,8

      glid=15,14; grid=1,1;   lid=7,6; lsize=8,8

      glid=8,15; grid=1,1;    lid=0,7; lsize=8,8

      glid=9,15; grid=1,1;    lid=1,7; lsize=8,8

      glid=10,15; grid=1,1;   lid=2,7; lsize=8,8

      glid=11,15; grid=1,1;   lid=3,7; lsize=8,8

      glid=12,15; grid=1,1;   lid=4,7; lsize=8,8

      glid=13,15; grid=1,1;   lid=5,7; lsize=8,8

      glid=14,15; grid=1,1;   lid=6,7; lsize=8,8

      glid=15,15; grid=1,1;   lid=7,7; lsize=8,8


      (This is a simplified kernel. The full one also does stuff, and it does work for the first work group, it just doesn't print anything.)


      I'm using Catalyst 14.7 beta on Windows 7 and a Radeon R9 280.

        • Re: printf doesn't work for first group

          Well, I don't really know how they implement printf in their driver, but I think that this is done with what is called Append/Consume Buffer in Direct3D11 (I don't think that there is a name for that in OpenCL). I think that Append/Consume Buffers use shared memory (LDS). So maybe you are running out of shared or buffer memory due to your printfs .

          Try printing shorter messages maybe.

          Or try implementing your own printf: allocate a big buffer, like numWorkgroups * workgroupSize * numBytesPerThreadChars, write your string into the numBytesPerThreads dedicated to the thread. (with some ASCII conversion). Then read back the buffer after the execution of the kernel and write each string from each thread.

          • Re: printf doesn't work for first group

            Thanks for reporting. I'll try to reproduce it and let you know my findings.



            • Re: printf doesn't work for first group


              I tried to reproduce the problem using following setup but it worked fine for me. Please find attached output.

              catalyst 14.7 beta (14.20.1004-140811a-174673E)

              HD 7900 series Tahiti XT (sorry don't have the Radeon R9 280 card rightnow)

              Window 7 64bit

              APP SDK 2.9


              I tried the same with other catalyst driver (14.30) and also with larger global work size say 32x32, but still didn't find any problem.

              If possible, please check it with other cards and let me know your findings.



              • Re: printf doesn't work for first group

                I was wondering about some relevant information regarding your system.


                1. What is the CPU your using?
                2. Is the copy of Windows 32-bit or 64-bit? Also, if windows is 64-bit is the program 32-bit or 64-bit?
                3. Which compiler is being used to build the software that is using the kernel?
                4. Is there any specific changes being made to the opencl states before running the program?