This is a follow-up of http://devgurus.amd.com/thread/167699
I use a kernel that looks like this :
typedef struct {float x,y,z;} packed_float3;
float3 to_float3(constant packed_float3 * arg)
{ return (float3)(arg->x,arg->y,arg->z); }
void to_packed_float3(float3 arg, global packed_float3* dst)
{ dst->x = arg.x; dst->y = arg.y; dst->z = arg.z; }
kernel void main(constant packed_float3 * vertices,
constant packed_float3 * normals,
constant float * distances,
global packed_float3 * result)
{
int global_id = get_global_id(0);
to_packed_float3(to_float3(&vertices[global_id]) + distances[global_id]* to_float3(&normals[global_id]), &result[global_id]);
}
and I enqueue it with a global range of (N, Null, Null), and null range for local range, on CPU I can use any value for N, on gpu, as soon as I approach 5000, clEnqueueNDRange() fails with CL_OUT_OF_RESSOURCES error.
I can't make CodeXL work on this project, but that is another issue.
Thank you
Ok I found out the answer, the constant memory space is limited to 128kb on my card