Archives Discussions

FrodoTheGiant · ‎12-17-2010

Hi,

I'd like to use AMDs APP FFT without all the overhead (lib/dll). Therefore I thought I simply take the generated OpenCL code - that can be extracted to a file - from the demo program.

Question: What's the use of this cb_t structure? Especially of this cb[1].u field in this line here:

uint last_row = 128 * cb[1].u;

How do I have to initialize this cb structure? It seems "f" and "i" are not used at all.

... typedef union { float f; uint u; int i; } cb_t; __attribute__((reqd_work_group_size(64,1,1))) __kernel void fft_fwd( __constant cb_t *cb __attribute__((max_constant_size(32))), __global float2 *gcomplx_in, __global float2 *gcomplx_out) { __local float lds[1024]; __global float2 * gpc; uint tbase; uint i0, j0; uint gid = get_global_id(2); gid = (gid * get_global_size(1)) + get_global_id(1); gid = (gid * get_global_size(0)) + get_global_id(0); uint me = gid & 15U; // work item within vector uint row_index = gid >> 4; // vector index uint last_row = 128 * cb[1].u; uint offset = min (row_index, last_row-1); offset = ((offset >> 7) << 15) + ((offset & 127) << 0); gcomplx_in = gcomplx_in + offset; offset = ((row_index >> 7) << 15) + ((row_index & 127) << 8); gcomplx_out = gcomplx_out + offset; ...

Die_in_Sente · ‎12-20-2010

If you reverse-engineer the kernel code, you can see that this is the number of 1-D FFTs the kernel should execute, or the batch size.

Die_in_Sente · ‎12-20-2010

Just curious, how did you capture the generated Code?

FrodoTheGiant · ‎12-21-2010

Easy. Just use command line flag "-d" with the provided sample program clAmdFft.Client.exe

FrodoTheGiant · ‎12-22-2010

How often, and with which parameters, do I have to call this kernel for a 32K FFT data array?

__kernel void fft_fwd(
__constant cb_t *cb __attribute__((max_constant_size(32))),
__global float2 *gcomplx_in,
__global float2 *gcomplx_out)
{

...

}

nareshsankapelly · ‎03-07-2011

FrodoTheGiant,

It is not a good idea to use generated kernel files for FFT. The reason is

"different kernels are genrated for different inputs. The kernel is dependent on dimensionality, size of the buffer, device etc. "

Try changing the size in the sample and check whether you are getting the same kernel or not.

nareshsankapelly · ‎03-07-2011

Die in Sente,

You can generate kernels using CLFFT_DUMP_PROGRAMS with debugFlags in clAmdFftSetupData.

Archives Discussions

AMD APP FFT - generated code