I am trying to implement compact operation in OpenCL. I also want to get the number of keys which fall in a particular bin. I am using atomic_inc. But I read that the global memory is very expensive for atomic_inc and that local memory can be used for it. Is there a way to use local memory better to update the keys?
__kernel void compact(__global int* inCode, __global int* isValid, __global int* Scan, __global int* outCode,__global int* numPoints, __global int* index)
int ig = get_global_id(0);
int m = isvalid[ig];
int j= Scan[ig];
int code = inCode[ig];
if(m == true)
int num = numPoints[j-1];
Actually you are not using local memory in the above code... if you declare any varialbe without any qualifiers like _global or _local , then it will be a private varialbe which is private to the thread.
Please refer opencl spec 1.2 section 6.5 Address space Qualifiers.
Yes, I am not using any local variables due to the fact that I am not sure as to how to introduce local variables in this code. For every parallel run of kernel, there is an atomic_inc updating the value. Can you guide me on the correct way of introducing local variables in the above code?
So what I need to do is to keep the unique keys along with the frequency of each key and the starting index of the unique key in the inCode array.
inCode - the input of the keys
isValid - a boolean variable which decides if the key is unique
Scan - the prefix scan sum of the final place of the unique keys.
outCode - output of unique keys
numPoints - frequency of each unique keys
index - output of starting index of the unique key in the inCode.
So I am checking for the true value in isValid and inputting into the outCode and the index. The frequency of the key is going to increment every time.
Hopefully this helps.
I understood what you are trying to do.. but really not able to understand with an example. Its better if you an example and explain. I am not able to assume scan array.
I am trying to code Octree on GPU. So if a set of keys, the valid elements and the Scan arrays are as:
#Keys #isValid #Scan
1000 1 0
1000 0 1
1001 1 1
1001 0 2
1001 0 2
1002 1 2
1002 0 3
1003 1 3
1004 1 4
1004 0 5
1004 0 5
1004 0 5
So by using the compact kernel, only the unique keys are kept in the positions given by the Scan arrays along with the starting index and the frequency
#uniqueKeys #numPoints #index
1000 2 0
1001 3 2
1002 2 5
1003 1 7
1004 4 8
Himanshu Gautam does this help?
For this data set of Scan the above code wont work. Because when j = 0 you will endup in accessing numPoints[-1]. This is not possible.
Whether your code is working fine?
One suggestion here i would like to give is you can make numPoints as local variable and update the values of each wg to the global memory at the end of the kernel. Keep the barrier before updating the global memory so that all the thread would have updated their values in the local array. Also to create local array you must know the size of it. So you should pass it as build option or you can even hardcode it. Rest everything can be global because you will be accessing it only once.