cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

Stream with elements of user-defined type - how?

when these errors will disappear?

1)
struct gpu_ap_signal{
int time_series[64];
int time_series_len;
int peak_bin;
float peak_power;
int scale;
double period;
int ffa_scale;
int n_client_bins;
};//R: parts of ap_signal needed in GPU code
struct gpu_ap_signals{
int num_of_signals;
struct gpu_ap_signal signal[30];
};//R: this type will be used as element of output stream in kernel call

kernel void GPU_FFA_kernel(float data[],int n_bins,float min_freq,out struct gpu_ap_signals s<>){
............

ERROR--1: Stream element type not supported
Statement: out struct gpu_ap_signals s<>

2)
float gpu_temp[4096];
ERROR--3: Problem with Array variable declaration: Local Array not supported yet
Statement: float gpu_temp[4096]

No global variables at all , no local arrays, no more or less complex structures...

3)
ERROR--7: Problem with call expression in kernel: kernel can't call a non-kernel
Statement: int_log2(per_int) in max_coadd = int_log2(per_int)

So, no callable functions? Is it possible to use macros at least ???
0 Likes
8 Replies
Ceq
Journeyman III

1. That structure is quite complex, you can't use arrays inside structures in the current version, I also think it doesn't support mixing float and double types. You have a very simple working example in: "BROOK\samples\legacy\tests\struct"

A workaround is to use structure members as individual kernel parameters.  AMD people tells that even if you use structures the compiler transforms them in simple parameters (However if you need to use structures anyway, I think that Brook+ compiler makes this faster than you packing and unpacking data for the GPU).

Note that if your kernel requires too many input/output streams the compiler will split your kernel in several passes and it will be slower.

2. That's right, I also think not having local arrays is a big restriction. It would be useful even if they just unroll arrays in simple GPU register operations (by the way, I think even if local arrays are suported 4096 elements is quite big to fit in the registers of a single thread).

3. Well, that isn't really a problem, is just that you can't call CPU code from GPU kernels. You still can use functions and macros:

- If you want to use those functions within GPU code define them as kernels, it will work as long as you don't use recursive calls. For example "kernel int next(int i) { return i + 1; }" can be called normally from other kernels.

- If you want to enable macros use "-pp" flag when calling BRCC compiller to enable the preprocessor.

 

0 Likes

1)
The fact that this structure exampel exists only in "legacy" part of samples very alarming. New samples set contains example of stream declarations only with basic (like float & float4) data types... Regress ?....

I need to all these data as result of kernel work.
Is it possible to use 3D streams?
Or is it possible to use such kernel:

void kernel k(float s1[][], float s2[][], float s3[][], ..., float sN[][],
out float o1[][], ..., out float oM[][], out float<>)
?
(that is, both scatter 2D and simple 1D streams in output and many gather streams in input )

2) Actually I didn't think about array of registers. I just need a way to allocate array in GPU memory from kernel. Surely it will be slower than array of registers, but number of registers very limited... Is it posible to get access to GPU memory inside kernel in other way than to declare some input stream for it?
I need some pretty big temporary buffers inside of kernel (that is, big (bigger than register set) amount of memory with read/write access )

To split this complex kernel to many simple kernels is not an option from performance point of view. I already passed that way - kernel setup overhead too inhibiting to approach with many simple kernels be useful 😞
0 Likes

As mentioned by Ceq, arrays inside struct is not supported.

You can have multiple gather streams, but you can't have more than one scatter streams and 8 regular output streams.

0 Likes

Originally posted by: gaurav.garg

As mentioned by Ceq, arrays inside struct is not supported.




You can have multiple gather streams, but you can't have more than one scatter streams and 8 regular output streams.



Is it possible to get access to GPU memory from kernel not through stream ?
0 Likes

No, it is not possible.

0 Likes

What about such kernel:
kernel void k(float a[][], out float b[][], out float c<>){...}

Called as:

Stream buf(2,size2);
Stream result(1,size);
k(buf,buf,result);

With aliasing enabled in runtime.

Will a/b streams act as randomly accessible 2D buffer in GPU memory if some precautions will be taken to avoid any race conditions (for example, each thread using only one index from first dimension) ?
0 Likes

The kernel should work fine but, aliasing is not possible.

0 Likes

Originally posted by: gaurav.garg

The kernel should work fine but, aliasing is not possible.



Even with corresponding env variable setted?
2D-stream aliasing prohibited by compiler?
0 Likes