Hi,
I started playing with darktable OpenCL code. I noticed this code [1]:
void rgb_to_lch (float *rgb, float *lch) {
...
}
kernel void highlights (...) {
...
float4 pixel = read_imagef(in, sampleri, (int2)(x, y));
float4 inc, lchi, lchc, lch;
...
rgb_to_lch((float *)&pixel, (float *)&lchi);
...
}
I found out, the float4 to (float*) cast doesn't work, at least on my system (rhel 6.2, OpenCL 1.2 AMD-APP (938.1), HD 7950). Is this expected behaviour ? Could it be that it works on nvidia ?
1. https://github.com/darktable-org/darktable/blob/master/data/kernels/basic.cl
What do you mean by "it doesn't work"?
It compiles on my desktop though ( Win7, AMD2.7, OpenCL1.2, HD6990).
By "it doesn't work" I mean it compiles, but the results are incorrect.
void rgb_to_lch (float *rgb, float *lch) {
int xPos = (int)get_global_id(0);
int yPos = (int)get_global_id(1);
if((xPos == 0) && (yPos==0))
printf("%f-%f\n", *rgb, *lch);
}
kernel void highlights (int in, ...) {
...
float4 pixel = (float4) in;
float4 lchi = (float4) 0;
...
rgb_to_lch((float *)&pixel, (float *)&lchi);
...
}
And this can give me the results I expect...
Hope this is helpful for u.
Try this:
void
rgb_to_lch (float *rgb, float *lch)
{
for (int c=0; c<3; c++) lch = rgb * 1.7;
}
kernel void
test1 (global float *in, global float *out)
{
const int x = get_global_id(0);
float4 input = vload4(x, in);
float4 res = (float4)0;
rgb_to_lch((float *)&input, (float *)&res);
vstore4(res, x, out);
}
If you try it with input data 1.0, 2.0, 3.0, 4.0 ... etc then the output will be this:
0 0 0 0 6.8 6.8 6.8 0 13.6 13.6 13.6 0 20.4 20.4 20.4 0 ...
Clearly the cast doesn't work (as expected (by me, at least ) )... So question is, is this *supposed* to work and a bug in compiler, or is that casting illegal ?
While this is not illegal, it isn't recommended to take the address of a vector for performance reasons. This code isn't failing on our internal builds, so it most likely has been fixed already.
Thanks. I will try it with new drivers/app sdk when they appear, but i think i'll fix the code to use vload / vstore anyway.