Hi,
I'd like to use AMDs APP FFT without all the overhead (lib/dll). Therefore I thought I simply take the generated OpenCL code - that can be extracted to a file - from the demo program.
Question: What's the use of this cb_t structure? Especially of this cb[1].u field in this line here:
uint last_row = 128 * cb[1].u;
How do I have to initialize this cb structure? It seems "f" and "i" are not used at all.
... typedef union { float f; uint u; int i; } cb_t; __attribute__((reqd_work_group_size(64,1,1))) __kernel void fft_fwd( __constant cb_t *cb __attribute__((max_constant_size(32))), __global float2 *gcomplx_in, __global float2 *gcomplx_out) { __local float lds[1024]; __global float2 * gpc; uint tbase; uint i0, j0; uint gid = get_global_id(2); gid = (gid * get_global_size(1)) + get_global_id(1); gid = (gid * get_global_size(0)) + get_global_id(0); uint me = gid & 15U; // work item within vector uint row_index = gid >> 4; // vector index uint last_row = 128 * cb[1].u; uint offset = min (row_index, last_row-1); offset = ((offset >> 7) << 15) + ((offset & 127) << 0); gcomplx_in = gcomplx_in + offset; offset = ((row_index >> 7) << 15) + ((row_index & 127) << 8); gcomplx_out = gcomplx_out + offset; ...
If you reverse-engineer the kernel code, you can see that this is the number of 1-D FFTs the kernel should execute, or the batch size.
Just curious, how did you capture the generated Code?
Easy. Just use command line flag "-d" with the provided sample program clAmdFft.Client.exe
How often, and with which parameters, do I have to call this kernel for a 32K FFT data array?
__kernel void fft_fwd(
__constant cb_t *cb __attribute__((max_constant_size(32))),
__global float2 *gcomplx_in,
__global float2 *gcomplx_out)
{
...
}
FrodoTheGiant,
It is not a good idea to use generated kernel files for FFT. The reason is
"different kernels are genrated for different inputs. The kernel is dependent on dimensionality, size of the buffer, device etc. "
Try changing the size in the sample and check whether you are getting the same kernel or not.
Die in Sente,
You can generate kernels using CLFFT_DUMP_PROGRAMS with debugFlags in clAmdFftSetupData.