Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

kernel cut short if executed on the GPU

A kernel is interrupted if executed on the GPU but not if executed on the CPU

Hi there!

I have this kernel here that executes a simple long for loop.

If I request execution on the GPU the last printf statement I get is at

"Run: 131070"


whereas on the GPU it is able to finish at k=200000

I'm on linux, Stream-2.3, ati-drivers-10.12, kernel 2.6.36.

The hardware is a Radeon 5770.



I should probably mention that I don't run this code parallely in any way.

The work_size is set to 1.

/*enable the printf extension by AMD*/ #pragma OPENCL EXTENSION cl_amd_printf : enable enum //various constants. we store them in an enum { Random_A = 471, Random_B = 1586, Random_C = 6988, Random_D = 9689, Random_M = 16383, }; typedef struct { int random_nd; int state[Random_M+1]; } Prng; typedef struct { float tau; int spin; } Vertex; typedef struct { float phase; uint order; } CPP_Configuration; typedef struct { global Prng* prng; } State; typedef struct { Prng prng; CPP_Configuration config; } CPP_state; int rnd(State* state) { global Prng* p = state->prng; ++(p->random_nd); return (p->state[p->random_nd & Random_M] = p->state[(p->random_nd - Random_A) & Random_M] ^ p->state[(p->random_nd - Random_B) & Random_M] ^ p->state[(p->random_nd - Random_C) & Random_M] ^ p->state[(p->random_nd - Random_D) & Random_M]); } float rndfloat(State* state) { return convert_float(rnd(state))/convert_float(2147483648U); }

2 Replies
Journeyman III

I found the issue.

It seems AMDs printf code can't queue up more than 131072

printfs on the GPU...


This should be written somewhere, where people find it


sifff, I will file a request to get more information about printf constraints documented for the GPU.