cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

shunyo
Journeyman III

Nested for loops GPU crashing

Hi,

I have a set of vectors and I need to find the triple product of all combinations of the vectors. I wrote a very simple 3-dimensional kernel code:


__kernel void compute_triple_prod(__global float4* pl, __global float* res)


{


  int i = get_global_id(0);


  int j = get_global_id(1);


  int k = get_global_id(2);


  int gs = get_global_size(0);


  int idx = i + j * gs + k * gs * gs;


  res[idx] = dot_prod(pl,cross_prod(pl,pl));


}


The code should run with N^3 outputs. I am trying to run for a set of 50 vectors. But the code crashes. I am using ATI FirePro V4800. Also, what are the ways to optimize the code?

0 Likes
1 Solution

Thanks Shunyo. Have you tried bringing the pl, pl and pl to private memory instead of directly passing to the dot and cross functions? If not, please try that for optimization.

Also, How are the pl and res buffers created? Do they reside in host or device?

View solution in original post

0 Likes
4 Replies
ravkum
Staff

Hi,

Could you share your implementation of dot_prod and cross_prod?

Regards,

Ravi

0 Likes

Sorry for the delay in replying. The implementations are given below:

float dot_prod(float4 u, float4 v)

{

  float prod;

  prod = u.x * v.x + u.y * v.y + u.z * v.z;

  return prod;

}

float4 cross_prod(float4 u,float4 v)

{

  float4 prod;

  prod.x = u.y*v.z - u.z*v.y;

  prod.y = u.z*v.x - u.x*v.z;

  prod.z = u.x*v.y - u.y*v.x;

  return prod;

}

0 Likes

Thanks Shunyo. Have you tried bringing the pl, pl and pl to private memory instead of directly passing to the dot and cross functions? If not, please try that for optimization.

Also, How are the pl and res buffers created? Do they reside in host or device?

0 Likes

Thanks for the suggestions. It worked out. Thanks.

Shunyo

0 Likes