Hi everyone
I'm a somewhat noob in programming and completely new to OpenCL !
I wrote some OpenCL code inside a C# Project ( using some online guide ) . its a selection sort but the problem is it won't sort at all ! it returns the number without any order
I really don't know what I did wrong so help me out here
this is the OpenCL code :
__kernel void ParallelSelection(__global int * in1)
{
int i = get_global_id(0); // current thread
int n = get_global_size(0); // input size
int temp;
for (int j = i + 1; j < n; j++)
{
if (in1
< in1) {
temp = in1
; in1
= in1; in1 = temp;
}
}
}
and this is the C# code :
* vecSum is the string containing OpenCL code
//Initializes OpenCL Platforms and Devices and sets everything up
OpenCLTemplate.CLCalc.InitCL();
//Compiles the source codes. The source is a string array because the user may want
//to split the source into many strings.
OpenCLTemplate.CLCalc.Program.Compile(new string[] { vecSum });
//Gets host access to the OpenCL floatVectorSum kernel
OpenCLTemplate.CLCalc.Program.Kernel VectorSum = new OpenCLTemplate.CLCalc.Program.Kernel("ParallelSelection");
//////////////////////
//numbers
int n = 9;
//Create vectors with 2000 numbers
int[] v1 = new int[9]{3,14,4,6,-45,4,23,5,1};
////////////////////////
//Creates vectors v1 in the device memory
OpenCLTemplate.CLCalc.Program.Variable varV1 = new OpenCLTemplate.CLCalc.Program.Variable(v1);
////////////////////////
//Arguments of VectorSum kernel
OpenCLTemplate.CLCalc.Program.Variable[] args = new OpenCLTemplate.CLCalc.Program.Variable[] { varV1 };
//How many workers will there be? We need "n", one for each element
int[] workers = new int[1] { n };
//Execute the kernel
VectorSum.Execute(args, workers);
////////////////////////////
//Read device memory varV1 to host memory v1
varV1.ReadFromDeviceTo(v1);
for (int i = 0; i < n; i++)
{
listBox1.Items.Add(v1.ToString());
}
__________________________________
EDIT : Thanks everyone I finally getting to know the SIMD model , for other people like myself interested in a simple sorting algorithm ( parallel of course ) here's the parallel selection sort OpenCL kernel :
__kernel void ParallelSelection(__global int * in1,__global int * out)
{
int i = get_global_id(0); // current thread
int n = get_global_size(0); // input size
int ikey = in1;
int pos = 0;
for (int j = 0 ; j < n; j++)
{
int jkey = in1
; bool smaller = (jkey < ikey);
pos+= (smaller)?1:0;
}
out[pos]=ikey;
}
Message was edited by: Behrooz Moradi
Solved! Go to Solution.
bz7-
At least one problem is that there is nothing to synchronize the work items. One work item could be writing a value in1 while another one is reading it. Understand that in the SIMD model all the work items are executing in parallel. There's no way of knowing which work items will finish first and so that code is inherently unpredictable. I think it's also probably wrong to make work id i (AKA thread i ) responsible for setting all the values from i up to n. You've got multiple work items all trying to change the same entries. As long as you have a work item for every single item in the array to be sorted, why not just let a work item set the value of a single item? Searching "selection sort kernel opencl" turned up a tutorial with some kernel source that probably work. I'd try one of those.
bz7-
At least one problem is that there is nothing to synchronize the work items. One work item could be writing a value in1 while another one is reading it. Understand that in the SIMD model all the work items are executing in parallel. There's no way of knowing which work items will finish first and so that code is inherently unpredictable. I think it's also probably wrong to make work id i (AKA thread i ) responsible for setting all the values from i up to n. You've got multiple work items all trying to change the same entries. As long as you have a work item for every single item in the array to be sorted, why not just let a work item set the value of a single item? Searching "selection sort kernel opencl" turned up a tutorial with some kernel source that probably work. I'd try one of those.
yes, the above code is the serial implementation of selection sort. Try checking out parallel implementation. APP SDK Samples contain a RadixSort Example FYI. Also I am not sure if selection sort can be parallelized.
Your OpenCL code is ridden with Race Conditions -- which have resulted from incomplete understanding of how to decompose your problem parallel.
Since all workitems proceed from I+1 to N, replacing values in that array here and there --- The workitems are interfering with each other's values and the net result is just confusion.
Technical term for this is "race condition"
As Himanshu mentioned before, Radix Sort is the most parallel friendly one and works well on GPU.
- Bruhaspati