cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

kbrafford
Adept II

doing global reads as float8

If you have a kernel that operates on a bunch of float4's, if your GPU has a 256 bit data path, would it make sense to read the incoming data as float8's, then access them as two float4's (via a pointer perhaps)?  Would that successfully hide the memory latency of one of the float4 accesses?

Assuming that works, what are the ramifications of that same code being compiled into a CPU context?  Will the same code still produce correct results and not suffer any degradation?

0 Likes
5 Replies
omkaranathan
Adept I

kbrafford, 

'OpenCL Performance and Optimization' section of OpenCL Programming guide explains in detail about the memory optimizations. That should answer your query and give you an idea on how to do efficient memory access.

ATI StreamSDK OpenCL Programming Guide

0 Likes

Nice PDF. Btw, why the constant buffer is limited to 16Kb? Are there 4 banks?

0 Likes

bubu,
That is a mistake that was not caught in time for the 2.1 release. In our next release this is expanded to 64kb which is what the hardware supports natively.
0 Likes

Btw, can't the max_constant_size attribute be forced in code using this?

 

kernel void mykernel(global int* a,
__constant int* b __attribute__((max_constant_size (16384)))

 

by

 

kernel void mykernel(global int* a,
__constant int b[16384] )

 

??

0 Likes

bubu,
That is something that will be available in our next release.
0 Likes