hi..
The following kernel code:
__global float3 *pos = (__global float3*)(bufPnts + ndx * simData->stride);
int gz = (pos->z - simData->min.z) * simData->delta.z ;
Perhaps float3 would be a useful type 😆. Section 6.1.7 (i'm using doc rev 29) refers to indices "1...4" and the AMD compiler complains if you try to define a struct with that name so it seems like it might be. I'll work around it.. thanks.
ok.. powers of two are certainly nice.
Out of curiosity, do you know when it makes sense to use a float4 vs, say, float[4]? Probably a FAQ isgust;.
Originally posted by: david_aiken ok.. powers of two are certainly nice.
Out of curiosity, do you know when it makes sense to use a float4 vs, say, float[4]? Probably a FAQ isgust;.
float4 et al are different from normal arrays because they use SSE instructions if present. If you have a cpu that supports these SSE instructions float4 will make the calculations faster. Not sure if it is of use (speed wise) on the gpu as well.
yes it is wise. i run two kernels. one have float parameters and other float4. first kernel have 200GPLOPS and second 800GPLOPS.