While going through the book "Programming massively parallel processor" i found the following statement in chapter 4 to be confusing :
instant in time, one instruction is fetched and executed for all threads
in the warp."
I believe that the concept of warp is not part of either CUDA or OpenCL specification. Let me explain the warp here. Once a block is assigned to streaming multiprocessor, it is further divide into 32 threads units -called WARPS.
What exactly do we mean by the instruction here? Does one instruction mean one kernel function or one of the instruction statements inside the kernel function?
Some lights into the matter will be appreciated.