cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

omion
Journeyman III

Bug in compiling ushort kernel for GPU

I'm writing a kernel that is designed to take the most advantage of the SSE2 instruction set, but I was doing a sanity check on the GPU and getting the wrong answers. It looks like some optimization in the compilation is getting the wrong answer.

I traced the problem down and got the smallest kernel that showed it. The code below produced the following Cypress assembly:

; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(3) KCACHE0(CB1:0-15)
0 x: LSHR R1.x, KC0[0].x, (0x00000002, 2.802596929e-45f).x
t: MOV R0.x, (0x00010001, 9.183689746e-41f).y
01 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R1].x___, R0, ARRAY_SIZE(4) VPM
END_OF_PROGRAM

Basically all it does is store 0x00010001 into the buffer, which is the wrong value (each ushort should be 0x5555 at the end)


However, if I manually unroll the entire loop it gives the following (correct) assembly:

; -------- Disassembly --------------------
00 ALU: ADDR(32) CNT(3) KCACHE0(CB1:0-15)
0 x: LSHR R1.x, KC0[0].x, (0x00000002, 2.802596929e-45f).x
t: MOV R0.x, (0x55555555, 1.466015469e13f).y
01 MEM_RAT_CACHELESS_STORE_RAW: RAT(11)[R1].x___, R0, ARRAY_SIZE(4) VPM
END_OF_PROGRAM


Compiling to CPU assembly doesn't reduce all the computations but does produce the right answer.

__kernel void __attribute__((reqd_work_group_size(1,1,1))) small_test_1_wu ( __global ushort2 *output ) { ushort2 xor_this = (ushort2)0; ushort2 input = (ushort2)0x4000; ushort2 factor = (ushort2)4; ushort2 hi; uint i; for(i = 0; i < 8; i++) { xor_this ^= ((as_ushort2((factor & (ushort2)1) == (ushort2)1)) & input); hi = as_ushort2(as_short2(input) >> (short)15); input <<= (ushort)1; input ^= hi & (ushort2)0x5555; factor = factor >> (ushort)1; } output[0] = xor_this; }

0 Likes
3 Replies

This is fixed internally. Thanks for reporting it.
0 Likes

Thanks!
0 Likes

Thanks , I had fixed internally

0 Likes