I am working on OpenCL implementions of various fast Kronecker transforms (Walsh-Hadamard, arithmetic, Reed-Muller...). I had no problems when writing kernels for transforms that work with integer arrays (Walsh-Hadamard and arithmetic), but I have the following problem with implementation of Reed-Muller transform, for which I'm asking for someone's help and advice:

This transform should operate on an array of bool values by doing an XOR operation on certain pairs of bool values, as you can see in the attached code. When I create the first parameter to the kernel as a __global bool array, the transform doesn't return correct values, and when I make this parameter a __global char array I get a 4x decrease in performance, but correct result. My guess is that there is some problem with __global bool arrays in OpenCL, but I'm not sure exactly what.

Thank you in advance for your time and help!

// kernel code __kernel void FastReedMullerTransform(__global bool *numarray, __const unsigned int step) { unsigned int tid = get_global_id(0); const unsigned int pair = tid%step + 2*step*(tid/step); const unsigned int match = pair + step; const bool u = numarray[pair]; const bool v = numarray[pair + step]; numarray[match] = u ^ v; } //referent C++ implementation template<class T> void CPUFastReedMullerTransform(T *inputArray, const unsigned int n) { { // for each pass of the algorithm for(cl_uint step=1; step < n; step <<=1) { cl_uint jump = step << 1; for(cl_uint pair = 0; pair < n; pair += jump) { cl_uint t1=pair; cl_uint t2=pair+step; for(cl_uint i = 0; i < step; ++i) { bool u = inputArray[t1]; bool v = inputArray[t2]; inputArray[t2] = u ^ v; ++t1, ++t2; } } } } return; }

See section 6.8.k of the spec.

"Arguments to __kernel functions in a program cannot be declared with the built-in

scalar types bool, half, size_t, ptrdiff_t, intptr_t, and uintptr_t. The"