hello everyone.
I got a problem on using half, below is my test kernel:
__kernel void test_half(
__global half* a,
__global half* b, // output
__global float* c // output)
{
int gid = get_global_id(0);
half8 tmpA = vload8(0, (__global half *)a + gid);
half8 tmpB = (half8)((half)1.0, (half)1.0, (half)1.0, (half)1.0, (half)1.0, (half)1.0, (half)1.0, (half)1.0);
half8 tmpC = (half8)(1);
tmpB = tmpA + tmpA;
//tmpA = tmpB * tmpA;
vstore8(tmpB, 0, (__global half *)b + gid);
}
On the host side, the thread size is just 1, besides, a = {1, 2, 3, 4, 5, 6, 7, 8};
when I printed out b: 0, 0, 0, 0, 0, 0, 0, 0.
Can someone explain it?
thanks.