i tried two kinds of kernels for projection,one is nearest interpolation.one is cubic interpolation.both of these two kernel are tranformed using image2D.
the kernel is below.
CB is a struct of parameters
there is no error during compile and run time. but oberviously,the kernel is not executed,even the parameter of CB is not transfored.
but the strange things is,if I tryed another kernel using nearest interpolation,which is commented out in the upper code,the kernel can run correctly.
can any one help me to fix this problem?
thanks.
__kernel void median(__global CB* cb, __read_only image2d_t SourceRgbaTex,sampler_t RowSampler,__global unsigned int* uiDest, unsigned int uiWidth, unsigned int uiHeight) { int gx=get_global_id(0); int gy=get_global_id(1); // float4 f4Sum = (float4)0.0f; if(gx < uiWidth && gy<uiHeight) { /* * transformation */ float k = 1.0f / (cb->a * gcol + cb->b * grow + 1.0f); float x = (cb->c * gcol + cb->d * grow + cb->e) * k; //col float y = (cb->f * gcol + cb->g * grow + cb->h) * k; //row int pix = (int)x; int scn = (int)y; /* * pixel sampling from src image */ unsigned int r, g, b; if ( pix < 0 || uiWidth <= pix || scn < 0 || uiHeight <= scn) { // out of source image, use padding instead r = PADDING; g = PADDING; b = PADDING; } else { /* // nearest neighborhood int x1=select(ceil(x),floor(x),fabs(ceil(x)-x)-fabs(floor(x)-x)>0); int y1=select(ceil(y),floor(y),fabs(ceil(y)-y)-fabs(floor(y)-y)>0); int2 pos = {x1, y1}; uint4 p = read_imageui(SourceRgbaTex, RowSampler, pos); r=p.x; g=p.y; b=p.z; // a=p.w; */ //bicubic interpolation float abyRed[4][4], abyGreen[4][4], abyBlue[4][4]; int i,j; for (i = 0; i < 4; i++) { for (j = 0; j < 4; j++) { int2 pos= {scn + i - 2,pix + j - 2}; float4 p0 = convert_float4(read_imageui(SourceRgbaTex, RowSampler,pos)); abyBlue
= p0.z; abyGreen = p0.y; abyRed = p0.x; /* uint4 p0 = read_imageui(SourceRgbaTex, RowSampler,(int2)(scn + i - 2,pix + j - 2)); abyBlue = 0; abyGreen =0; abyRed = 0;*/ } } float afu[4], afv[4]; float fv = y - scn;//[0,1) float fu = x - pix; afu[0] = Sinxx(1.0f + fu); afu[1] = Sinxx(fu); afu[2] = Sinxx(1.0f - fu); afu[3] = Sinxx(2.0f - fu); afv[0] = Sinxx(1.0f + fv); afv[1] = Sinxx(fv); afv[2] = Sinxx(1.0f - fv); afv[3] = Sinxx(2.0f - fv); float afRed[4] = { 0.0f, 0.0f, 0.0f, 0.0f }; float afGreen[4] = { 0.0f, 0.0f, 0.0f, 0.0f }; float afBlue[4] = { 0.0f, 0.0f, 0.0f, 0.0f }; for (i = 0; i < 4; i++) { for (j = 0; j < 4; j++) { afRed += afv * abyRed ; afGreen += afv * abyGreen ; afBlue += afv * abyBlue ; } } r = (T)(BOUND((afu[0] * afRed[0] + afu[1] * afRed[1] + afu[2] * afRed[2] + afu[3] * afRed[3]), 0, 255)); g = (T)(BOUND((afu[0] * afGreen[0] + afu[1] * afGreen[1] + afu[2] * afGreen[2] + afu[3] * afGreen[3]), 0, 255)); b = (T)(BOUND((afu[0] * afBlue[0] + afu[1] * afBlue[1] + afu[2] * afBlue[2] + afu[3] * afBlue[3]), 0, 255)); }; uint4 w = { r, g, b, 0 }; //uint4 p = read_imageui(SourceRgbaTex, RowSampler, (int2)(gx,gy)); uiDest[grow * get_global_size(0)+ get_global_id(0)] = rgbaInt4ToUint(w); } };
You might be suffering with the same kind of problems as seen in this thread:
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=145087&enterthread=y
In short, don't pass samplers to your kernel, define the samplers in the OpenCL source instead.
thanks for your reply!
I changed the code to like this :
__constant sampler_t RowSampler=CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;
__kernel
void median(__global CB* cb,
__read_only image2d_t SourceRgbaTex,__global unsigned int* uiDest,
unsigned int uiWidth, unsigned int uiHeight)
but,unluckly,the problem is still......
Try moving cb to the end of the kernel arguments list. I think there have been problems in the past with argument order.
Also look carefully at the definition of the structure to ensure that the alignment of the components of CB are correct. I've seen people having trouble with alignment in structures before. Tricky subject. I can't tell if CB has more than the components a to h. They all appear to be floats, so really it's a question of whether there are any other components in CB that aren't 32-bits in size.
Instead of using the structure you can try to make the components of it distinct arguments. e.g. make a to h distinct. Boring, I know.
The name "CB" makes me think of "constant buffer", a concept from D3D10. Are you trying to replicate that here? Might be simpler to just make this a __constant.
Also instead of a to h, if you define cb as __constant float *cb then you could simply use cb[0] to cb[7].
thanks for your reply,
I don't think it's the CB's problem,because when i use the nearest algorithm ,which is commented out in the code attached , It runs OK. That's strange.
I encountered another problem,maybe some similar.
I will post separately.
------------------------------------------------
now ,its
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=146796&enterthread=y
thanks again!
Josling,
Could you please give us runtime code also and system information(OS, CPU, GPU, SDK Version, Driver)?
oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 3.2.1, SDK Revision = 7027912, NumDevs = 1, Device = Quadro FX 1700
System Info:
Local Time/Date = 13:45:56, 02/04/2011
CPU Name: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz
# of CPU processors: 2
Linux version 2.6.31.14-0.6-default (geeko@buildhost) (gcc version 4.4.1 [gcc-4_4-branch revision 150839] (SUSE Linux) ) #1 SMP 2010-12-10 11:18:32 +0100
Originally posted by: josling oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.0 CUDA 3.2.1, SDK Revision = 7027912, NumDevs = 1, Device = Quadro FX 1700 System Info: Local Time/Date = 13:45:56, 02/04/2011 CPU Name: Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz # of CPU processors: 2 Linux version 2.6.31.14-0.6-default (geeko@buildhost) (gcc version 4.4.1 [gcc-4_4-branch revision 150839] (SUSE Linux) ) #1 SMP 2010-12-10 11:18:32 +0100
This is not the right place to ask Nvidia OpenCL/device issues. Please ask at Nvidia forum where you get quick reply.
actually , I asked ,but it seems Nvidia forum is not as hot as ATI.
and I used Nvidia platform ,but use ATI SDK.
run it on AMD platform and if you got this error then ask.