const CALchar* ILKernel =
"sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx\n"
"div r0, r0, cb0.x\n"
"mul o0, r0, cb0\n"
The thing I can't understand about the code is that:
there are four components in r0 and therefore four values in o0.
But the hellocal example is working on 256X256 of CAL_FORMAT_FLOAT_1.
The memory resource declared is 2D FORMAT_FLOAT_4.
There are 256X256 threads and each thread writes 4 floats.
Why is the final result 256X256 floats instead of 256X256X4 floats?
Yes, you are right. But what I meant is that there are 256X256 threads each with o0 of four components. How come the output is 256X256 floats not 256X256X4 floats? There must be some hidden stuff, which probably says only o0.x is effective. I just can't explain it.