a question about the IL code in the hellocal sample

Discussion created by jfkong on Nov 21, 2008
Latest reply on Nov 21, 2008 by jfkong

const CALchar* ILKernel =
"dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__\n"
"dcl_output_generic o0\n"
"dcl_cb cb0[1]\n"
"sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx\n"
"div r0, r0, cb0[0].x\n"
"mul o0, r0, cb0[0]\n"


The thing I can't understand about the code is that:

there are four components in r0 and therefore four values in o0.

But the hellocal example is working on 256X256 of CAL_FORMAT_FLOAT_1.

The memory resource declared is 2D FORMAT_FLOAT_4.

There are 256X256 threads and each thread writes 4 floats.

Why is the final result  256X256 floats instead of 256X256X4 floats?