maximmoroz

Is barrier allowed inside cycle?

Discussion created by maximmoroz on Jul 7, 2011
Latest reply on Jul 9, 2011 by maximmoroz

I have a kernel. The kernel produces slightly different results each execution, which bothers me greatly :) I have already spent enourmous amount of time trying to figure out the problem but with no success yet.

The kernel has an outer cycle of a fixed size. There is barrier(CLK_LOCAL_MEM_FENCE) inside the cycle. A wild idea came to my mind: Is barrier allowed inside cycle? honestly I don't remeber seeing other code with barrier inside the cycle.

Below is stripped verion of the kernel. Any ideas are welcomed.

__kernel __attribute__((reqd_work_group_size(16, 16, 1))) void ConvolutionRegister( const __global float * restrict input, __global float * restrict output, const __global float * restrict weights, const __global int * restrict weights_offsets, const __global float * restrict biases ) { __local float input_buffer[IN_SIZE]; __local float weight_buffer[W_SIZE]; float sum = 0.0F; for(uint input_feature_map_id = 0; input_feature_map_id < INPUT_FEATURE_MAP_COUNT; input_feature_map_id++) { const int weights_offset = weights_offsets[input_feature_map_id]; // fill local weight_buffer and input_buffer // ... // end of fill local weight_buffer and input_buffer barrier(CLK_LOCAL_MEM_FENCE); // update weighted sum // ... // end of update weighted sum } // write result to output feature map output[some_index] = sum; }

Outcomes