If yes, it is very serious...
Hello,
after posting yesterday about my problems with __local uint4 vectors variables, I tried to understand better the problem: it seems that I can read these variables without problems but I can't modify them correctly in an "usual" way (see below what I mean).
Consider this simple kernel (ignore the parameters that aren't used inside the kernel...):
/** * Kernel: uint4 vector test; */ __kernel void intersect(__global uint4 *inputVD, __global uint *inputPrefixCk, __global uint *outputCounts, __local uint4 *tmpBuffer, const int k, const int dimRow) {
// Simple mask const uint4 MASK = (uint4) (1);
// Recover global ID of the work item; const int gid = get_global_id(0);
// Test: modify __local vector variable with vstore4; vstore4(4,0,tmpBuffer+gid); // OK here....
// Test: modify __local vector variable with an operator (in this case "+"); tmpBuffer[gid] += MASK; // NOT ok here: cannot modify tmpBuffer value correctly! Only "x" and "w" components are updated correctly... // Final output of the work-item; outputCounts[gid] = tmpBuffer[gid].x + tmpBuffer[gid].y + tmpBuffer[gid].z + tmpBuffer[gid].w; } |
As you can read from the source comments, when I modify a __local vector uint4 variable location with "vstore4" everything works, but if I try a stupid operation (in the example a sum with another uint4 vector variable) only components "x" and "w" (the first and the last) are updated.
Is this normal? What is the explanation of this? A compiler bug maybe? The only workaround I've found in order to update correctly a __local vector variable is to use vstore4...in fact if I replace the non-working line with:
// Test: store the result of the sum with vstore4 instead of using a simple sum... vstore4(tmpBuffer[gid] + MASK,0,tmpBuffer+gid); // Again ok here.... |
Then everything works fine. So it really seems we have a problem when writing "implictly" (that is, without using vstore4 explicitly ) a vector, because "y" and "z" components aren't updated or they take strange values...