I believe that some of you in the forum had gone through the book "Heterogeneous Computing with OpenCL" . I am going through chapter 5 of its 1st edition.
And bit confused with the association of cuda warp and opencl wavefront.
In CUDA, once a block of threads is assigned to streaming multi-processor, it is further divided into 32-thread units called warp. So conceptually , a warp contains 32-thread units. It is according to the book "Programming massively parallel processor"
In OpenCL, according to the book, "The best example of this is on the GPU, where as many as 64 work items execute in lock step as a single
hardware thread on a SIMD unit: On AMD architectures, this is known as a wavefront, and on NVIDIA architectures it is called a warp. The result is SIMD execution
According to the book, a warp contain 64 threads.
Which one is right ?
Please let me know if i missed anything or misunderstood.
NVIDIA's current warps are 32 work items in size. The text you quote doesn't say that warps have 64 elements, it just says that current architectures contain up to 64 elements in their SIMD execution. AMD's wavefronts are 32 or 64 depending on the chip, NVIDIA's are 32, Intel's are something else (8 or 16 maybe, I forget). That is "up to 64". Even CPUs would fit in this category if we hadn't restricted that text to the GPU - an AVX-supporting CPU would have a wavefront of 8 by the same definition.