I have a simple copy kernel which copies from array a to b. When using a pixel shader of uint4 types, I divide the domain width by 4. It validates correctly.
How do I properly do this in a compute shader. The domain size is too large. I am getting 15 GB/sec for the copy instead of 60 GB/sec (in pixel shader).