cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

digrobot
Journeyman III

constant array packing

Hello!

In my kernel I am using constant array of bytes:

__constant uchar myconst[256]  = {  0x5a, 0xe6, 0x61, 0xf4, 0x31, 0xe3, 0x85 ....................skipped values here...............};


But when I look at generated IL Assembly code,  it seems that  8-bit array values are packed to 128-bit constant registers

...
dcl_cb cb2[16]    <=here is my array

...

mov r1011, cb2[r1007.x]
cmov_logical r1011.x___, r1008.y, r1011.y, r1011.x  <=unpacking row value from 128-bit string from here
cmov_logical r1011.x___, r1008.z, r1011.z, r1011.x
cmov_logical r1011.x___, r1008.w, r1011.w, r1011.x
iand r1006.x___, r1010.x, l15
iadd r1006, r1006.x, l16 ieq r1008, r1006, l12
ishr r1011, r1011.x, l17
cmov_logical r1011.x___, r1008.y, r1011.y, r1011.x
cmov_logical r1011.x___, r1008.z, r1011.z, r1011.x
cmov_logical r1011.x___, r1008.w, r1011.w, r1011.x
ishl r1011.x___, r1011.x, l18
ushr r1011.x___, r1011.x, l18 mov r65._y__, r1011.x

Is there any way to prevent such behavior and store only one array element per constant register? The maximum speed is my goal.

0 Likes
3 Replies
realhet
Miniboss

Hi,

On pre GCN cards the register size is 128bits, so it can't be optimized there. To eliminate cmovs you can use a 128bit type for the array. It's waste of memory, though.

On GCN reg size is 32bits at minimum and if I remember well, the CAL compiler will optimize out these cmovs. To make sure, check the .isa file for the s_cmov_b32 instructions.

0 Likes

Thanks for the advice. But I could not find any scalar 128-bit type in OpenCL specification. Compiler says error that the type "long long" is "nonstandard".
I'm working with preCGN cards. In general, I need fast lookup table. Сurrently, byte replacement is the bottleneck of my program.

0 Likes

I meant that whatever type you need to access in the array, you should use float4 or int4 as an array element. Maybe you will use only the first float of the accessed float4, but no cmovs will issued in the alu.

0 Likes