cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jfkong
Journeyman III

a question about the IL code in the hellocal sample

const CALchar* ILKernel =
"il_ps_2_0\n"
"dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__\n"
"dcl_output_generic o0\n"
"dcl_cb cb0[1]\n"
"dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)\n"
"sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx\n"
"div r0, r0, cb0[0].x\n"
"mul o0, r0, cb0[0]\n"
"end\n";

 

The thing I can't understand about the code is that:

there are four components in r0 and therefore four values in o0.

But the hellocal example is working on 256X256 of CAL_FORMAT_FLOAT_1.

The memory resource declared is 2D FORMAT_FLOAT_4.

There are 256X256 threads and each thread writes 4 floats.

Why is the final result  256X256 floats instead of 256X256X4 floats?

Thanks

0 Likes
2 Replies
bayoumi
Journeyman III

Hi jfkong
Unless I am mistaken, it seems to me in the sk1.2.1 version I have that hellocal.cpp uses float (not float4) for input/output buffers. It uses only 1 float4 element for constant buffer (to carry possibly 4 float constants).
Best Regards
Amr
0 Likes

hi, bayoumi

 

Yes, you are right. But what I meant is that there are 256X256 threads each with o0 of four components.  How come the output is 256X256 floats not 256X256X4 floats?  There must be some hidden stuff, which probably says  only o0.x is effective.  I just can't explain it.

0 Likes