Hi,
I have a set of vectors and I need to find the triple product of all combinations of the vectors. I wrote a very simple 3-dimensional kernel code:
__kernel void compute_triple_prod(__global float4* pl, __global float* res)
{
int i = get_global_id(0);
int j = get_global_id(1);
int k = get_global_id(2);
int gs = get_global_size(0);
int idx = i + j * gs + k * gs * gs;
res[idx] = dot_prod(pl,cross_prod(pl
,pl )); }
The code should run with N^3 outputs. I am trying to run for a set of 50 vectors. But the code crashes. I am using ATI FirePro V4800. Also, what are the ways to optimize the code?
Solved! Go to Solution.
Thanks Shunyo. Have you tried bringing the pl, pl
Also, How are the pl and res buffers created? Do they reside in host or device?
Hi,
Could you share your implementation of dot_prod and cross_prod?
Regards,
Ravi
Sorry for the delay in replying. The implementations are given below:
float dot_prod(float4 u, float4 v)
{
float prod;
prod = u.x * v.x + u.y * v.y + u.z * v.z;
return prod;
}
float4 cross_prod(float4 u,float4 v)
{
float4 prod;
prod.x = u.y*v.z - u.z*v.y;
prod.y = u.z*v.x - u.x*v.z;
prod.z = u.x*v.y - u.y*v.x;
return prod;
}
Thanks Shunyo. Have you tried bringing the pl, pl
Also, How are the pl and res buffers created? Do they reside in host or device?
Thanks for the suggestions. It worked out. Thanks.
Shunyo