Miscompile?
The code attached to this message fails for me on APP 2.5 with Catalyst 11.7 on Linux.
On execution of version 1 the memory object out is filled with zeros. If I change the code to the one in version 2 same problem occurs.
I know this is not optimal as all threads will access the same memory address, however functionally it _should_ work correctly. Even putting the value into `__constant` memory does not change anything (I assume that's what the first version implicitly does). Both versions work with APP 2.4, though. It will also work with APP 2.5 if I simply replace `my_complex` by `double2`. This however is not a real solution in more complex cases where I originally ran into this issue. Another way to work around the problem is to `my_float` to be single precision. It does not make any difference whether I use `cl_amd_fp64` or `cl_khr_fp64`. Another way to get the second variant working is to use the function in the workaround section to load the value from the pointer.
Is this a regression in the APP or am I doing something stupid here? I really wanted avoid to typedef `double2` as `my_complex` as otherwise its multiplication operator could accidently be invoked.
CODE Version 1: #pragma OPENCL EXTENSION cl_amd_fp64 : enable #define FIELDSIZE 10024 typedef double my_float; typedef struct { my_float re; my_float im; } my_complex; __kernel void fillComplex(__global my_complex * out, const my_complex value) { for(size_t i = get_global_id(0); i < FIELDSIZE; i += get_global_size(0)) { out = value; } } CODE Version 2: #pragma OPENCL EXTENSION cl_amd_fp64 : enable #define FIELDSIZE 10024 typedef double my_float; typedef struct { my_float re; my_float im; } my_complex; __kernel void fillComplex(__global my_complex * out, __global my_complex * value_p) { const value = *value_p; for(size_t i = get_global_id(0); i < FIELDSIZE; i += get_global_size(0)) { out = value; } } CODE Workaround: my_complex complexLoadHack(__global const my_complex * p) { union { double2 v; my_complex c; } tmp; tmp.v = *((__global const double2*) p); return tmp.c; }