Archives Discussions

vladimir_1 · ‎07-10-2015

Hi,

For R9 390 card with 15.20 kernel (latest one)

Given kernel :

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable

#define VALUE_TYPE float

#define COMPAT_VALUE_TYPE int

void my_atomic_add(__global VALUE_TYPE * loc, const VALUE_TYPE f)

{

VALUE_TYPE old = *loc;

VALUE_TYPE sum = old + f;

while(atomic_cmpxchg( (__global COMPAT_VALUE_TYPE*)loc, *((COMPAT_VALUE_TYPE*)&old), *((COMPAT_VALUE_TYPE*)&sum) ) != *((COMPAT_VALUE_TYPE*)&old))

{ old = *loc; sum = old + f; }

}

__kernel void test_atomic_lock(__global float* output )

{

int idx = get_global_id(0);

my_atomic_add(output, 1);

}

l

launched with local/global size 16.

When compiled in 1.2 compability mode it finishes and produces correct result (16).

When compiled in 2.0 compatibility mode (build options -cl-std=CL2.0 -D CL_VERSION_2_0) it hangs the card.

built in R7 (spectre) correctly executed in both mode using 14.12 driver and showed same behaviour (1.2 succeeds and 2.0 fails).

It seems that 15.xx driver version introduced some fatal upgrade to the conditional loop handling in 2.0 compatibility mode.

vladimir_1 · ‎07-10-2015

Sorry for double posting, indeed 2.0 mode brings in an optimizer "enhancement"

Following kernel works:

void my_atomic_add(__global VALUE_TYPE * loc, const VALUE_TYPE f)

{

VALUE_TYPE old = *loc;

VALUE_TYPE sum = old + f;

volatile bool test =true;

while(test)

{

test = atomic_cmpxchg( (__global COMPAT_VALUE_TYPE*)loc, *((COMPAT_VALUE_TYPE*)&old), * ((COMPAT_VALUE_TYPE*)&sum) ) != *((COMPAT_VALUE_TYPE*)&old);

if (test)

{

old = *loc; sum = old + f;

}

Notice the volatile before bool =(

Conclusion: optimizer should re-evaluate atomic operations in the while loop conditions.

OpenCL 1.2 vs OpenCL 2.0 loops