I want to remove an unnecessary write to global memory (threshold_ptr_plus_flags) at the beginning of my kernel by using a local variable instead (plus_flag). However, while my modified code appears to run properly on both the CPU and the GPU with "-cl-opt-disable", it breaks on the CPU when optimizations are enabled.
This is my kernel :
__kernel void setup_plus_minus_A(
const int index,
__local char* const threshold_ptr_plus_tmp_flags,
__global char* const threshold_ptr_plus_flags,
__constant const float* const candidates_value
)
{
char plus_flag;
const int id = get_global_id(0);
const int local_id = get_local_id(0);
if (id > 0) {
if (id<index && candidates_value[id - 1]==candidates_value[id]){
threshold_ptr_plus_flags[id] = 0;
plus_flag = 0;
} else {
threshold_ptr_plus_flags[id] = 1;
plus_flag = 1;
}
} else {
threshold_ptr_plus_flags[0] = 0;
plus_flag = 0;
}
barrier(CLK_LOCAL_MEM_FENCE|CLK_GLOBAL_MEM_FENCE);
if (threshold_ptr_plus_flags[id] != plus_flag) {
printf((__constant char *)"AP Id: %d diff: %d vs %d\n", id, (int)threshold_ptr_plus_flags[id], (int)plus_flag);
}
threshold_ptr_plus_tmp_flags[local_id] = plus_flag; //threshold_ptr_plus_flags[id]; <- Here I would like to use plus_flag instead of threshold_ptr...
/* ...not shown here: prefix sum on threshold_ptr_plus_tmp_flags... */
/* Write results back to global memory */
threshold_ptr_plus_flags[id] = threshold_ptr_plus_tmp_flags[...];
}
With optimizations enabled on the CPU, the above kernel gives me incorrect results for some threads and prints the following:
Id: 738 diff: 1 vs 0
...
Id: 768 diff: 0 vs 1
...
Do you have any idea what I'm doing wrong for this to happen?
Thank you for your help,