Dear Community,
I am trying to write an OpenCL kernel that counts the matching elements between to arrays. However, I do not get the expected output.
The kernel looks like this:
__kernel void matchVectors(
__global const uchar *a,
__global const uchar *b,
volatile __global uchar *c)
{
const int i = get_global_id(0);
if ( ( a[i] ^ b[i] ) == 0 )
{
atomic_inc(c, 1);
}
}
a (input) - string, 10 bytes (chars) long
b (input) - string, 10 bytes (chars) long
c - single byte (the counter that should indicate how many indexes match between a[i] and b[i])
And below are some parts of the main.c (non-OpenCL) code:
#define ELEMENT_TYPE uint8_t
#define SIZE 10
ELEMENT_TYPE c = 0;
cl_mem aMemObj = clCreateBuffer(context, CL_MEM_READ_ONLY, SIZE * sizeof(ELEMENT_TYPE), NULL, &ret);
cl_mem bMemObj = clCreateBuffer(context, CL_MEM_READ_ONLY, SIZE * sizeof(ELEMENT_TYPE), NULL, &ret);
cl_mem cMemObj = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(ELEMENT_TYPE), NULL, &ret);
// Copy lists to memory buffers
ret = clEnqueueWriteBuffer(commandQueue, aMemObj, CL_TRUE, 0, SIZE * sizeof(ELEMENT_TYPE), a, 0, NULL, NULL);
ret = clEnqueueWriteBuffer(commandQueue, bMemObj, CL_TRUE, 0, SIZE * sizeof(ELEMENT_TYPE), b, 0, NULL, NULL);
// Execute kernel
ret = clEnqueueNDRangeKernel(commandQueue, kernel, clDimensions, NULL, &globalItemSize, &localItemSize, 0, NULL, NULL);
// Get the result
ret = clEnqueueReadBuffer(commandQueue, cMemObj, CL_TRUE, 0, sizeof(c), (void *)&c, 0, NULL, NULL);
// Write result
printf("Result: c = %u\n", c);
Could anyone advise what is wrong?
Solved! Go to Solution.
Can you please try the modified main.cpp file attached here to see if it is working for you?
Thanks.
*elements between two arrays
/For some reason, I am not able to edit my initial post due to an "HTML error", sorry for the mistake!/
As described here, atomic_inc functions:
"Read the 32-bit value (referred to as old) stored at location pointed by p. Compute (old + 1) and store result at location pointed by p. The function returns old."
So, please try to use "int" or " unsigned int " as argument while calling the atomic function.
Thanks.
Hi @dipak ,
I tried what you suggested, both int and unsigned int. But unfortunately, the behaviour is still the same.
May I send you the code, if possible (as a private message)?
Thanks and regards.
Sure, you can provide the reproducible code. Also, please provide the following information:
1) setup details like OS, gpu, driver version etc.
2) clinfo output
Thanks.
Hi @dipak
I am neither able to attach the archive here in the post, nor can I see a button in your profile to send you a private message.
Could you please advise how to proceed?
Thanks.
As you joined recently, I think some of these community features are not enabled yet.
To share the repro and other information, you can upload them to any file hosting site and provide us the download link.
Thanks.
Please try with the below link:
https://drive.google.com/file/d/15lcgxb6ECSrlrVmNrR7-obGQpc1YbQn_/view?usp=sharing
Thanks for sharing the download link. I will check the files and get back to you.
Yes, amazing... now it works perfectly!
Thank you very much, @dipak ! I really appreciate your time and effort on my issue!