On CGN, processing elements or vector-lanes of each SIMD are effectively scalar. I don't think vectorization has much effect on GCN devices as compared to earlier VLIW arch. In fact, vectorization may degrade the performance as mentioned in the optimization guide:
"Notes" under section "Specific Guidelines for GCN family GPUs" - Vectorization is no longer needed, nor desirable; in fact, it can affect performance because it requires a greater number of VGPRs for storage. It is recommended not to combine work-items.
- Read coalescing does not work for 64-bit data sizes. This means reads for float2, int2, and double might be slower than expected.
|
Regards,