cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

landmann
Journeyman III

uchar16 vs. float4

Hi,

again it is a kind of memory transpose kernel I am working on. I realized that when using the uchar16 data type the compiler generates 4 read and 4 write instructions to transfer one element ( dest[idx] = src[idx2] ), whereas declaring the pointers to point to float4 only generates one read and one write instruction to transfer the same amount of data.

What prevents the compiler from doing the same operation for the uchar16 data type?

Thanks!

Joerg

0 Likes
12 Replies
FrodoTheGiant
Journeyman III

uchar16 vs. float4

Originally posted by: landmannI realized that when using the uchar16 data type the compiler generates 4 read and 4 write instructions ...


 

May I ask which tool you use to get this information (num of reads/writes) ?

0 Likes
landmann
Journeyman III

uchar16 vs. float4

I am using Stream Kernel Analyzer 1.7. Although the trust to its numbers is sometimes questionable I hope that at least the disasm view is correct.

0 Likes
himanshu_gautam
Grandmaster

uchar16 vs. float4

landman,

Although I am not very sure on this and it would be nice to hear from others.

What i feel is that it would not be possible for a processing element to process more than one vector element at a time. With float4 we can process four floats with 4 general purpose processing elements  but with uchar16 it will process just four uchars at a time. So it should take about 4x the time.

 

0 Likes
jeff_golds
Staff
Staff

uchar16 vs. float4

If you feel you are input-bound, you could try something like:

as_uchar16(((uint4*)a)[idx]) in placeof a[idx].

Jeff

0 Likes
landmann
Journeyman III

uchar16 vs. float4

Sure, but my question is "why" should I do these nasty tricks at all? My kernel does not even evaluate the memory content, I just started using the native data type. Now that I am using float4 it looks much better.

I was looking for an explanation, to check what I did wrong, or ,of course, hoping to read "will be fixed in 2.4"

0 Likes
MicahVillmow
Staff
Staff

uchar16 vs. float4

Our hardware does not support uchar16 natively, so we emulate it with integers, and the largest integer type we support natively is vec4, so the uchar16 gets broken down into vec4 which is why you see 4x as many loads. This will not be fixed in 2.4.
0 Likes
nou
Exemplar

uchar16 vs. float4

but when i load uchar4 why it can't load as int and then put into four registers?

0 Likes
jeff_golds
Staff
Staff

uchar16 vs. float4

Originally posted by: nou but when i load uchar4 why it can't load as int and then put into four registers?

 

It does that already, right?  That's why uchar16 takes 4 loads.

Jeff

0 Likes
FrodoTheGiant
Journeyman III

uchar16 vs. float4

Originally posted by: jeff_golds If you feel you are input-bound, you could try something like:

 

as_uchar16(((uint4*)a)[idx]) in placeof a[idx]. Jeff

 

 

If I would do something like that - how much overhead would that be?

`

Or a more general question: How much overhead is type casting?

 

E.g. something like

int a = 13;

float b = (float) a;

0 Likes