3 Replies Latest reply on Oct 19, 2010 1:06 PM by MicahVillmow

    printf implicit sync?

    Meteorhead
      what does it do?

      Hi!

      My situation is I have a simulation program that gives differing results based on me having printf() commands inside the kernel or not.

      I had several variables printed out for debugging, and I realised on the way printf commands impose major lag on execution. When the code was working I removed all printf commands from the kernel and it started giving bad results.

      Not all, but some kernels return fishy results, which leads me to suspect there is a sync issue somewhere in the code.

      How exactly does printf alter kernel execution?

        • printf implicit sync?
          nou

          IMHO printf() on GPU is implemented that there is some buffer where it put strings during execution. and to get position where to write it must execute some atomic_add to some position counter. so printf() can in some way serialize your code.

          • printf implicit sync?
            himanshu.gautam

            hi meteorhead,

            AFAIK printf cannot affect kernel execution & hence should not produce fishy results.

            But i guess while a kernel executes a printf call it needs to write down something on the stdut terminal(monitor) which can only be done serially.So printf would result in some contention as each thread has to execute this command serially.So depending on the context switching mechanisms used by OS the threads would be run in serial fashion.Anyhow this should only affect the order of execution of threads and will only affect results if the results depend on this order(which should never be taken for granted).

            If the problem persists:

            Post your code here and we may discuss what's causing your code to result in wrong outputs.Also post your system configurationS,CPU,GPU,SDK,DRIVER.  

            Thanks

            • printf implicit sync?
              MicahVillmow
              Himanshu,
              That is not entirely correct. The runtime breaks up a kernel launch into multiple smaller launches based on how much printf data is used. This can introduce unintended side-effects when attempting to synchronize global memory as there are implicit global barriers in a printf kernel.