Any suggestion for optimizing?

Discussion created by Raistmer on Aug 11, 2009
Brook+ kernel listed

This kernel should fold initial 1D array into 2D array where each 1D subarray contains folded initial array and 2 1D subarrays differ one from another by folding stride.
Maybe some suggestion for further optimization of this kernel?

kernel void GPU_fetch_array_kernel74t(int sub_buffer_size,float src[],float freq[],out float4 dest<>){ int j=instance().y; int threadID=instance().x; int k=0; int l=0; float4 acc=float4(0.f,0.f,0.f,0.f); float f=freq[threadID]; double period=(double)sub_buffer_size/(double)f; int n_per=(int)f; for(k=0;k<n_per;k++){ l=(int)(period*(double)k + 0.5); l+=(4*j);//R: index to data array computed acc.x=src[l]; acc.y=src[l+1]; acc.z=src[l+2]; acc.w=src[l+3]; } dest=acc; }