I have a question about the kernel max inputs and outputs. The user-guide denotes that the max inputs 128, max outputs 8. But now I have a problem. My program needs 9 outputs. The kernel is defined like this:
float nx, float ny, float nz4,
float dx, float dy, float dz,
float K_Gx_a, float K_Gx_b, float K_Ex_a, float K_Ex_b, float K_Ex_c, float K_Ex_d,
float K_Gy_a, float K_Gy_b, float K_Ey_a, float K_Ey_b, float K_Ey_c, float K_Ey_d,
float K_Gz_a, float K_Gz_b, float K_Ez_a, float K_Ez_b, float K_Ez_c, float K_Ez_d,
float4 HX, float4 HY, float4 HZ,
float4 FX0<>, float4 FY0<>, float4 FZ0<>,
float4 GX0<>, float4 GY0<>, float4 GZ0<>,
float4 EX0<>, float4 EY0<>, float4 EZ0<>,
float4 IND, float K_A, float K_B,
int PT_SOURCE_EX, int PT_SOURCE_EY, int PT_SOURCE_EZ, int N_COORD_PTSOURCE,
out float4 EX<>, out float4 EY<>, out float4 EZ<>,
out float4 FX, out float4 FY<>, out float4 FZ<>, out float4 GX<>, out float4 GY<>, out float4 GZ<>;
Is there any way to sovle the problem? And is there any global memory to be uesd so as that we do not need to put the FX....GX...(those 6 outputs) as outputs but just as global variables? Or any other tips?
My card is AMD HD4850 635/1986MHZ
THX for your help!
kernel runs fine even if kernel has more than 8 output streams with some performance overhead.
Brook compiler generates multipass code and runtime handles properly if kernel has more then 8 output streams.
Ex: if kernel has 10 output. than
Compiler generates two kernels. first kerne having 8 outputs and send kernel having 2 outputs but copies the kernel code as it is in both kernels which is a overhead if your kernel doing lot of computation.