cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Some performance-related questions about current OpenCL implementation

1) Is it possible to have local (per thread) array stored into registers in OpenCL?
It was not possible in Brook+.
That is, if I write inside kernel:
float4 buf[32];
will these 32 elements placed into registers or they will be spilled into global memory?

2) If write to global memory buffer resides in not chosen branch inside branch instruction (per wavefront basis) will this write be avoided or zeros or junk will be written anyway?
Also, will such write be avoided per wavefront or per thread basis?

3) How many registers can be used per thread to still hide global memory read latence more or less effectively? (or how many wavefronts per SIMD should be launched simultaneously to hide read latence?)
0 Likes
21 Replies