IMHO printf() on GPU is implemented that there is some buffer where it put strings during execution. and to get position where to write it must execute some atomic_add to some position counter. so printf() can in some way serialize your code.
AFAIK printf cannot affect kernel execution & hence should not produce fishy results.
But i guess while a kernel executes a printf call it needs to write down something on the stdut terminal(monitor) which can only be done serially.So printf would result in some contention as each thread has to execute this command serially.So depending on the context switching mechanisms used by OS the threads would be run in serial fashion.Anyhow this should only affect the order of execution of threads and will only affect results if the results depend on this order(which should never be taken for granted).
If the problem persists:
Post your code here and we may discuss what's causing your code to result in wrong outputs.Also post your system configurationS,CPU,GPU,SDK,DRIVER.
That is not entirely correct. The runtime breaks up a kernel launch into multiple smaller launches based on how much printf data is used. This can introduce unintended side-effects when attempting to synchronize global memory as there are implicit global barriers in a printf kernel.