cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

FrodoTheGiant
Journeyman III

AMD APP FFT - generated code

Hi,

I'd like to use AMDs APP FFT without all the overhead (lib/dll). Therefore I thought I simply take the generated OpenCL code - that can be extracted to a file - from the demo program.

Question: What's the use of this cb_t structure? Especially of this cb[1].u field in this line here:

uint last_row = 128 * cb[1].u;

How do I have to initialize this cb structure? It seems "f" and "i" are not used at all.

... typedef union { float f; uint u; int i; } cb_t; __attribute__((reqd_work_group_size(64,1,1))) __kernel void fft_fwd( __constant cb_t *cb __attribute__((max_constant_size(32))), __global float2 *gcomplx_in, __global float2 *gcomplx_out) { __local float lds[1024]; __global float2 * gpc; uint tbase; uint i0, j0; uint gid = get_global_id(2); gid = (gid * get_global_size(1)) + get_global_id(1); gid = (gid * get_global_size(0)) + get_global_id(0); uint me = gid & 15U; // work item within vector uint row_index = gid >> 4; // vector index uint last_row = 128 * cb[1].u; uint offset = min (row_index, last_row-1); offset = ((offset >> 7) << 15) + ((offset & 127) << 0); gcomplx_in = gcomplx_in + offset; offset = ((row_index >> 7) << 15) + ((row_index & 127) << 8); gcomplx_out = gcomplx_out + offset; ...

0 Likes
6 Replies
Die_in_Sente
Journeyman III

If you reverse-engineer the kernel code, you can see that this is the number of 1-D FFTs the kernel should execute, or the batch size.

0 Likes
Die_in_Sente
Journeyman III

Just curious,  how did you capture the generated Code?

0 Likes

Easy. Just use command line flag "-d" with the provided sample program clAmdFft.Client.exe

0 Likes

How often, and with which parameters, do I have to call this kernel for a 32K FFT data array?

__kernel void fft_fwd(
    __constant cb_t *cb __attribute__((max_constant_size(32))),
    __global float2 *gcomplx_in,
    __global float2 *gcomplx_out)
{

...

}

0 Likes

FrodoTheGiant,

It is not a good idea to use generated kernel files for FFT. The reason is 

"different kernels are genrated for different inputs. The kernel is dependent on dimensionality, size of the buffer, device etc. "

Try changing the size in the sample and check whether you are getting the same kernel or not.

0 Likes

Die in Sente,

You can generate kernels using CLFFT_DUMP_PROGRAMS with debugFlags in clAmdFftSetupData.   


 

0 Likes