Hello, I have written two kernel programs listed as follows:
------------------------------------------------------------------------------------------------
(1) __kernel void initialize_variables(int n, __global float* input, __global float* output){}
(2) __kernel void initialize_variables(__global float* input, __global float* output, int n){}
------------------------------------------------------------------------------------------------
The first kernel canot work properly (it cannot find the correct address of input/output), while the second can work properly. The difference lies in the orders of parameters.
My testbed is Intel920 and AMD APP v2.4.
Btw, both kernels can work properly on GPUs (HD5870).
Could anybody tell me the reasons?Thanks.
Can you please post your kernel. I am able to run matrix transpose SDK sample on both CPU & GPU by shifting the arguments.
Also mention your system info:GPU,CPU,SDK,Driver,OS.
My kernels are like this:
__kernel void order_kernel_1(__global float* src, __global float* dst, int n){
int id = get_global_id(0);
dst[id] = src[id];
}
__kernel void order_kernel_2(int n, __global float* src, __global float* dst){
int id = get_global_id(0);
dst[id] = src[id];
}
The second kernel cannot work properly (I do not even use the parameter 'n').
The testbed configurations are like this:
GPU: HD5870;
CPU: Intel920;
SDK: APP v2.4
OS: Ubuntu 10.04
Driver: 8.85.6
do you change indexs in clSetKernelArg()?
Yes, I invoke the kernel like this:
void init_var(int num_eles, cl_mem src, cl_mem dst) throw(string){
//first kernel
std::cout<<"first kernel"<<std::endl;
int kernel_id = 0;
int kernel_idx = 0;
_clSetArgs(kernel_id, kernel_idx++, src);
_clSetArgs(kernel_id, kernel_idx++, dst);
_clSetArgs(kernel_id, kernel_idx++, &num_eles, sizeof(int));
int work_group = 256;
int work_items = num_eles;
_clInvokeKernel(kernel_id, work_items, work_group);
std::cout<<"second kernel"<<std::endl;
kernel_id = 1;
kernel_idx = 0;
_clSetArgs(kernel_id, kernel_idx++, &num_eles, sizeof(int));
_clSetArgs(kernel_id, kernel_idx++, src);
_clSetArgs(kernel_id, kernel_idx++, dst);
work_group = 256;
work_items = num_eles;
_clInvokeKernel(kernel_id, work_items, work_group);
}
As per this explanation, you are trying to run the same kernel with different kernel argument patterns(but ultimately it is the same kernel which would expect only a fixed kernel signature). IMHO IT is better to post the code than trying such abstractions.
Hi, I don't know where to paste the code. So I upload the source code to google code
http://code.google.com/p/easy-opencl/downloads/list, named 'order.tar.gz'
Please check it.
Originally posted by: haibo031031 Yes, I invoke the kernel like this:
void init_var(int num_eles, cl_mem src, cl_mem dst) throw(string){ //first kernel std::cout<<"first kernel"<
Didnt test it, but glancing at your code, I see kernel_idx++ might be the problem. If the var is initialized to 0, youre actually setting args 1,2,3. You should set 0,1,2 instead.
Originally posted by: bollig Originally posted by: haibo031031 Yes, I invoke the kernel like this:
void init_var(int num_eles, cl_mem src, cl_mem dst) throw(string){ //first kernel std::cout<<"first kernel"<
Didnt test it, but glancing at your code, I see kernel_idx++ might be the problem. If the var is initialized to 0, youre actually setting args 1,2,3. You should set 0,1,2 instead.
It is var++, rather than ++var, so there is no problem here.
Can you have two kernels sharing a name differing only in signatures?