I've AMD IL code that looks like:
dcl_literal l12,0x80000000,0x80000000,0x80000000,0x80000000
...
mov r45.x,l12.x
...
iadd r34.x,r34.x,r45.x
If constant for iadd == 0x80000000 then calCL just ignoring it. In compiled code there no ADD_INT instruction generated. When changing constant to anything else (like 0x7ffffff or 0x80000001) everything became OK.
I've started to think that I missed something in declaration of constant (signed/unsigned) but ixor r34.x,r34.x,r45.x working ok with correct XOR_INT instruction generated.
Is this calCL compiler/optimizer bug or am I missed something being not so familiar with AMD IL atm?
empty_knapsack,
If it is possible, can you email a test shader showing this to streamdeveloper@amd.com and have them forward it to me so I can verify and work on getting it fixed?
Thanks,
If it is possible, can you email a test shader showing this to streamdeveloper@amd.com
Done.
In fact shader is simple enough to just post it here:
il_ps_2_0
dcl_input_position_interp(linear_noperspective) vWinCoord0.xy__
dcl_output_generic o0
dcl_output_generic o1
dcl_output_generic o2
dcl_cb cb0[4]
dcl_resource_id(0)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(1)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
dcl_resource_id(2)_type(2d,unnorm)_fmtx(float)_fmty(float)_fmtz(float)_fmtw(float)
sample_resource(0)_sampler(0) r0, vWinCoord0.xyxx
sample_resource(1)_sampler(0) r1, vWinCoord0.xyxx
sample_resource(2)_sampler(0) r2, vWinCoord0.xyxx
dcl_literal l1,0x7fffffff,0x7fffffff,0x7fffffff,0x7fffffff
dcl_literal l2,0x80000000,0x80000000,0x80000000,0x80000000
dcl_literal l3,0x80000001,0x80000001,0x80000001,0x80000001
iadd r10.x,r0.x,l1.x
ixor r10.x,r10.x,l2.x
iadd r11.x,r1.x,l2.x
ixor r11.x,r11.x,l3.x
iadd r12.x,r2.x,l3.x
ixor r12.x,r12.x,l1.x
mov o0,r10
mov o1,r11
mov o2,r12
end
There no add,0x80000000 in compiled output, only 2 adds and 3 xors.
What are the range of values you're sampling? Also, what result is placed into the output stream o0? Is it just the value you sampled xord by INT_MIN+1? Finally, do you know the addition isn't taking place because your result isn't what you expected, or did you disassemble the kernel?
Originally posted by: rick.weber What are the range of values you're sampling? Also, what result is placed into the output stream o0? Is it just the value you sampled xord by INT_MIN+1? Finally, do you know the addition isn't taking place because your result isn't what you expected, or did you disassemble the kernel?
I've disassembled the kernel after i've got wrong results on several sample datas. Actually it's easy to see how this kernel will be compiled via Stream KernelAnalyzer -- just copy/paste kernel there, results (for RV770) will be:
; -------- Disassembly --------------------
00 TEX: ADDR(64) CNT(3) VALID_PIX
0 SAMPLE R1.x___, R0.xyxx, t0, s0 UNNORM(XYZW)
1 SAMPLE R2.x___, R0.xyxx, t2, s0 UNNORM(XYZW)
2 SAMPLE R0.x___, R0.xyxx, t1, s0 UNNORM(XYZW)
01 ALU: ADDR(32) CNT(19)
3 x: ADD_INT ____, R2.x, (0x80000001, -1.401298464e-45f).x
y: ADD_INT ____, (0x7FFFFFFF, 1.#QNANf).y, R1.x
z: XOR_INT R0.z, R0.x, (0x80000001, -1.401298464e-45f).x VEC_201
4 x: XOR_INT R0.x, (0x7FFFFFFF, 1.#QNANf).x, PV3.x
w: XOR_INT R0.w, PV3.y, (0x80000000, 0.0f).y
5 x: MOV R3.x, PV4.x
y: MOV R3.y, R0.y
z: MOV R3.z, R0.y
w: MOV R3.w, R0.y
6 x: MOV R2.x, R0.z
y: MOV R2.y, R0.y
z: MOV R2.z, R0.y
w: MOV R2.w, R0.y
7 x: MOV R1.x, R0.w
y: MOV R1.y, R0.y
z: MOV R1.z, R0.y
w: MOV R1.w, R0.y
02 EXP_DONE: PIX0, R1 BRSTCNT(2)
END_OF_PROGRAM
As you see, no iadd for 0x80000000. I suspect the reason of this that (unsigned int)0x80000000 == -0.0f. And CAL CL decided to remove "unnecessary" addition with "zero" being wrong about int addition == float one.
This has been fixed and should be in 1.4
OK, thanks, I'll be waiting for 1.4 then.
Is there a defined way of circumventing this bug? It's really annoying
I'm just installed 1.4 SDK and... bug is still there :/.
Doesn't looks good at all...
I'm just realized that updating SDK means nothing as all compiler logic done by dlls which are only updating when Catalyst driver updating. So as long as Catalyst still 9.2 nothing will change.
And also as Stream doesn't looks like top priority for ATI/AMD, only CAL compiler bug fixes isn't enough to start process of Catalyst driver update. So we need to wait some other major Catalyst driver update to see any SDK change.
Am I right?
The SDK and CAL are no longer directly connected. This was done to make all graphic cards CAL ready so that people could develop applications and have them run on machines with Radeons without requiring the users to download the SDK.
The downside to this is that the SDK and CAL move at different speeds. Where the SDK is updated quarterly, the driver is update monthly but it follows the driver development cycle which used to be explained here: http://www.phoronix.com/vr.php?view=10083. The basics is that it takes three months for a feature/bug fix to go from implementation through testing and release. This bug was fixed last month, so it should be public in the next one or two driver releases.
/sigh
I'll really prefer to have most updated compiler at all times rather than single driver distribution as I' not using runtime calcl* calls anyway -- all kernels precompiled to elf binaries. But it doesn't looks like there's an option.
Anyway, thanks for a reply.