computing different kernels in parallel (Brook+)

Discussion created by rrr on Oct 28, 2008
Latest reply on Oct 29, 2008 by Methylene
Is it possible to submit a batch of different kernels to the GPU which use different SIMD engines?


My problems have quite small matrix and vector sizes (10 to 100), so one kernel invocation often would keep only one SIMD engine busy. Does the brook+ runtime detect "independent" kernel invocations and submits further kernel invocations to the GPU without waiting on the completion of the already started kernels?


kernel void sum1(double v1<>, double v2<>, double result<>
    result = v1+v2;

kernel void sum2(double v1<>, double v2<>, double result<>
    result = v1+v2;

int main(int argc, char** argv)
    double a1<10>;    
    double a2<10>;    
    double a3<10>;    
    double ret1 <10>;
    double ret2 <10>;
    double ret3 <10>;
    sum2(a1,a3,ret2); // started in parallel to the sum1 invocation?
    sum1(a2,a3,ret3); // started in parallel to both previous invocations?

If Brook+ can parallelize such simple kernel invocations (in my example it would be feasable, because all output streams are disjunct), how does Brook+ detect independency of output streams? E. g., could Brook+ parallelize if ret1..ret3 are replaced by domain operators on a matrix which select disjunct domains?

If Brook+ cannot parallelize the kernel invocations, could I alternatively use CPU threads in the Brook+ (CPU) program to feed the kernel invocations to the GPU in parallel?

best regards