4 Replies Latest reply on Feb 5, 2014 8:52 AM by shunyo

    Nested for loops GPU crashing



      I have a set of vectors and I need to find the triple product of all combinations of the vectors. I wrote a very simple 3-dimensional kernel code:

      __kernel void compute_triple_prod(__global float4* pl, __global float* res)
        int i = get_global_id(0);
        int j = get_global_id(1);
        int k = get_global_id(2);
        int gs = get_global_size(0);
        int idx = i + j * gs + k * gs * gs;
        res[idx] = dot_prod(pl[i],cross_prod(pl[j],pl[k]));

      The code should run with N^3 outputs. I am trying to run for a set of 50 vectors. But the code crashes. I am using ATI FirePro V4800. Also, what are the ways to optimize the code?