Hi,
For R9 390 card with 15.20 kernel (latest one)
Given kernel :
#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable
#define VALUE_TYPE float
#define COMPAT_VALUE_TYPE int
void my_atomic_add(__global VALUE_TYPE * loc, const VALUE_TYPE f)
{
VALUE_TYPE old = *loc;
VALUE_TYPE sum = old + f;
while(atomic_cmpxchg( (__global COMPAT_VALUE_TYPE*)loc, *((COMPAT_VALUE_TYPE*)&old), *((COMPAT_VALUE_TYPE*)&sum) ) != *((COMPAT_VALUE_TYPE*)&old))
{ old = *loc; sum = old + f; }
}
__kernel void test_atomic_lock(__global float* output )
{
int idx = get_global_id(0);
my_atomic_add(output, 1);
}
l
launched with local/global size 16.
When compiled in 1.2 compability mode it finishes and produces correct result (16).
When compiled in 2.0 compatibility mode (build options -cl-std=CL2.0 -D CL_VERSION_2_0) it hangs the card.
built in R7 (spectre) correctly executed in both mode using 14.12 driver and showed same behaviour (1.2 succeeds and 2.0 fails).
It seems that 15.xx driver version introduced some fatal upgrade to the conditional loop handling in 2.0 compatibility mode.