FrodoTheGiant

AMD APP FFT - generated code

Discussion created by FrodoTheGiant on Dec 17, 2010
Latest reply on Mar 7, 2011 by nareshsankapelly

Hi,

I'd like to use AMDs APP FFT without all the overhead (lib/dll). Therefore I thought I simply take the generated OpenCL code - that can be extracted to a file - from the demo program.

Question: What's the use of this cb_t structure? Especially of this cb[1].u field in this line here:

uint last_row = 128 * cb[1].u;

How do I have to initialize this cb structure? It seems "f" and "i" are not used at all.

... typedef union { float f; uint u; int i; } cb_t; __attribute__((reqd_work_group_size(64,1,1))) __kernel void fft_fwd( __constant cb_t *cb __attribute__((max_constant_size(32))), __global float2 *gcomplx_in, __global float2 *gcomplx_out) { __local float lds[1024]; __global float2 * gpc; uint tbase; uint i0, j0; uint gid = get_global_id(2); gid = (gid * get_global_size(1)) + get_global_id(1); gid = (gid * get_global_size(0)) + get_global_id(0); uint me = gid & 15U; // work item within vector uint row_index = gid >> 4; // vector index uint last_row = 128 * cb[1].u; uint offset = min (row_index, last_row-1); offset = ((offset >> 7) << 15) + ((offset & 127) << 0); gcomplx_in = gcomplx_in + offset; offset = ((row_index >> 7) << 15) + ((row_index & 127) << 8); gcomplx_out = gcomplx_out + offset; ...

Outcomes