cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

landmann
Journeyman III

uchar16 vs. float4

Hi,

again it is a kind of memory transpose kernel I am working on. I realized that when using the uchar16 data type the compiler generates 4 read and 4 write instructions to transfer one element ( dest[idx] = src[idx2] ), whereas declaring the pointers to point to float4 only generates one read and one write instruction to transfer the same amount of data.

What prevents the compiler from doing the same operation for the uchar16 data type?

Thanks!

Joerg

0 Likes
12 Replies
FrodoTheGiant
Journeyman III

Originally posted by: landmannI realized that when using the uchar16 data type the compiler generates 4 read and 4 write instructions ...


 

May I ask which tool you use to get this information (num of reads/writes) ?

0 Likes

I am using Stream Kernel Analyzer 1.7. Although the trust to its numbers is sometimes questionable I hope that at least the disasm view is correct.

0 Likes

landman,

Although I am not very sure on this and it would be nice to hear from others.

What i feel is that it would not be possible for a processing element to process more than one vector element at a time. With float4 we can process four floats with 4 general purpose processing elements  but with uchar16 it will process just four uchars at a time. So it should take about 4x the time.

 

0 Likes

If you feel you are input-bound, you could try something like:

as_uchar16(((uint4*)a)[idx]) in placeof a[idx].

Jeff

0 Likes

Sure, but my question is "why" should I do these nasty tricks at all? My kernel does not even evaluate the memory content, I just started using the native data type. Now that I am using float4 it looks much better.

I was looking for an explanation, to check what I did wrong, or ,of course, hoping to read "will be fixed in 2.4"

0 Likes

Originally posted by: jeff_golds If you feel you are input-bound, you could try something like:

 

as_uchar16(((uint4*)a)[idx]) in placeof a[idx]. Jeff

 

 

If I would do something like that - how much overhead would that be?

`

Or a more general question: How much overhead is type casting?

 

E.g. something like

int a = 13;

float b = (float) a;

0 Likes

Our hardware does not support uchar16 natively, so we emulate it with integers, and the largest integer type we support natively is vec4, so the uchar16 gets broken down into vec4 which is why you see 4x as many loads. This will not be fixed in 2.4.
0 Likes

but when i load uchar4 why it can't load as int and then put into four registers?

0 Likes

Originally posted by: nou but when i load uchar4 why it can't load as int and then put into four registers?

 

It does that already, right?  That's why uchar16 takes 4 loads.

Jeff

0 Likes

FrodoTheGiant,
There is a difference between type casting and bit casting.

as_uchar16 is a bitcast and the overhead is the unpacking of the char types from the uint4.
Typecasting follows the OpenCL conversion rules and in some cases can be fairly expensive. Type casting of pointers has no overhead.

In the case of the as_uchar16 bitcast, you are explicitly doing what the compiler does implicitly. The only difference between the code snippets is loading a uint4* is done in a single load, but loads with a uchar16 is done with 4 loads. Both approaches require unpacking of the data into 32bit registers.
0 Likes

Thanks Micah,

 

how expensive exactly is a cast from int to float?

0 Likes

bitcast is free, typecast is 1 instruction per component.
0 Likes