i m using opncl in my m.tech thesis........and i need to write opecl program for sum of array....i m new to opencl....can anybody help me on this topic..
there is reduction algorithm in SDK examples. look at that.
hi....thanks for replying...i have installed AMD gpu in my system....also AMD APP SDK....
i have seen reduction example.....but i dont know how will i use it in my code.....
i m having kernel code for sum of array.....i m having problem with wrting host code....
__kernel void reduce ( __constant float ∗ input , __local float ∗ sums ,__global float ∗ output )
{
float sum = 0 ;
for ( uint x = get_global_id ( 0 ) ; x < 1024 ; x += get _global _size(0))
sum += input [ x ] ;
// S t o r e t h e sum i n l o c a l memory
sums [ get_local_id(0)] = sum ;
//Make s u r e t h e l o c a l memory i s
barrier(CLK_LOCAL_MEM_FENCE) ;
//safe for reading
if ( get_local_id(0) == 0 )
{
for ( uint i(1) ; i < get_ local_s ize(0) ; ++i )
sum += sums ;
output[get_group_id(0)] = sum ;
}
}
can any provide host code for this.....that will be a great help.....
Install AMD APP SDK and check out the Reduction sample.
After installation, check the "samples/opencl/cl/app/Reduction" sample.
Note that APP SDK samples can work on CPU as well.
So, for your initial learning, you can live without GPU and its drivers.
But, at some point, you will need the GPU - for measuring performance.
Reduction is a term used to describe how a binary-operator that has associative property.can be applied to an array of data-items to generate one output. i.e. the array is reduced to an output value.
Example
"sum = A + B + C + D ....... "
+ is the binary operator with the associative property.
hi Himanshu....thanks for replying...i have installed AMD gpu in my system....also AMD APP SDK....
i have seen reduction example.....but i dont know how will i use it in my code.....
i m having kernel code for sum of array.....i m having problem with wrting host code....
__kernel void reduce ( __constant float ∗ input , __local float ∗ sums ,__global float ∗ output )
{
float sum = 0 ;
for ( uint x = get_global_id ( 0 ) ; x < 1024 ; x += get _global _size(0))
sum += input [ x ] ;
// S t o r e t h e sum i n l o c a l memory
sums [ get_local_id(0)] = sum ;
//Make s u r e t h e l o c a l memory i s
barrier(CLK_LOCAL_MEM_FENCE) ;
//safe for reading
if ( get_local_id(0) == 0 )
{
for ( uint i(1) ; i < get_ local_s ize(0) ; ++i )
sum += sums ;
output[get_group_id(0)] = sum ;
}
}
can any provide host code for this.....that will be a great help.....
The code above should work, but IMHO the reduction sample should perform better, as reduction within a group is being done by a single thread here.
For looking for host code, I suggest you to either use Reduction.cpp or do some homework and ask specific questions. For starting look HelloWorld sample and Refer to OpenCL spec 1.2 for description of various APIs.
Hi Ankit, Since you are still new to OpenCL and trying to figure everything out, it may make sense for you to try Bolt, the latest version of the code available here: http://developer.amd.com/tools/heterogeneous-computing/amd-accelerated-parallel-processing-app-sdk/b... It's like STL for OpenCL. It's only released for windows right now, and you don't specify which platform you are using. However, if you are using windows, this would probably be the easiest way for you to get a high-performance reduction working. There is a sample for calculating the standard deviation for an array of numbers, which uses reduction. Kent
Thanks for replying ,can i use this bolt library in linux or its only for windows....and instructions to use it...
currently is is only for windows. they are working on porting it on linux.