I'm trying to profile some operations inside the SIMD at the level of cycles. I know that I should use the timer register (Tmr). I can read its value. However, I get the same value before and after instructions I'm trying to profile. Below is my code.
"dcl_input_interp(linear) v0.xy__ \n"
"sample_resource(0)_sampler(0) r5, vWinCoord0.xyxx\n"
"dcl_output_generic o0.xy \n"
"dcl_cb cb0 \n"
"sample_resource(0)_sampler(0) r0, v0.xyxx \n"
"mov r1.xy__, Tmr.xy \n"
"dmul o0, r0, v0.x \n"
"mov r2.xy__, Tmr.xy \n"
"inegate r4.x___, r1.x\n"
"iadd r3.x___, r2.x,r4.x \n"
"mov o0.x, r1.x\n"
"mov o0.y, r2.x\n"
I haven't played with your code, but the output of a pixel shader is normally a floating point number.
Your integers appear to be getting lost because you haven't done integer->float conversions.
For what it's worth, a dmul will take one cycle.
You can inspect the code that will execute on the GPU using Stream Kernel Analyzer or GPU Shader Analyzer, and from that see the cycle counts. No need to write test shaders to measure instruction latencies.
ddiv is more interesting...