Inside the kernel “add_to_grid” in "Kernel.cl", following code is incorrect.
for (uint i = get_local_id(0); i < TIMESTEPS * CHANNELS; i += get_local_size(0)) { ... shared_visR[0] = (float4) (visXX.x, visXY.x, visYX.x, visYY.x); |
It creates out of bound access and triggers undefined behaviour. C99 standard ISO/IEC 9899:TC3 Annex J.2:
— "An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6)." |
Given the incorrect code compiler can produce fast but incorrect code.
Regards,