const CALchar* ILKernel =
"il_ps_2_0\n"
"dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__\n"
"dcl_output_generic o0\n"
"dcl_cb cb0[1]\n"
"dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n"
"sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx\n"
"div r0, r0, cb0[0].x\n"
"mul o0, r0, cb0[0]\n"
"end\n";
The thing I can't understand about the code is that:
there are four components in r0 and therefore four values in o0.
But the hellocal example is working on 256X256 of CAL_FORMAT_FLOAT_1.
The memory resource declared is 2D FORMAT_FLOAT_4.
There are 256X256 threads and each thread writes 4 floats.
Why is the final result 256X256 floats instead of 256X256X4 floats?
Thanks