cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

jackpien
Journeyman III

Open source OpenCL for CPU

Just downloaded the new ATI Stream SDK v2 beta and playing with the OpenCL samples.  Great job AMD!

Any chance AMD will be open sourcing the OpenCL implementation for the CPU?  It would be a very productive step to furthering AMD/ATI's stream compute efforts by leveraging the open source community to further the OpenCL development.

Thanks

Jack

 

0 Likes
32 Replies
riza_guntur
Journeyman III

Thanks for the info, the approval email has just arrived when the time you posted. Download in progress.

0 Likes

LoL, same restrictions as with Brook+. Are you sure we need to move to OpenCL then? 😉
"
. Currently, it is not possible to define local memory variables inside OpenCL-C functions or
kernels. For example:
__kernel void example(__global float4* ptr)
{
__local float4 temp[256];
...
}
. Currently, arrays are not supported inside structs.
"
0 Likes

Originally posted by: Raistmer LoL, same restrictions as with Brook+. Are you sure we need to move to OpenCL then? " . Currently, it is not possible to define local memory variables inside OpenCL-C functions or kernels. For example: __kernel void example(__global float4* ptr) { __local float4 temp[256]; ... } . Currently, arrays are not supported inside structs. "


Nope.

I could say: Brook+ is easier. I think if Brook+ could have multithread CPU backend, it will be cool.

Do we need TWO languages pointing to same direction and same usage?

0 Likes

Originally posted by: riza.guntur I think if Brook+ could have multithread CPU backend, it will be cool.

 

 

True!

The original BrookGPU (v0.5) already has an OpenMP backend.

0 Likes

Is this OpenCL release built on top of CAL/IL much like Brook+??

0 Likes

The CPU component isn't but the GPU component will be (built on top of CAL).

0 Likes

I have a problem compiling Hellocl.cpp program given in samples on 64-bit linux machine.

 I am attaching the error file.

gcc HelloCL.cpp -I ./ -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/libGLEW.a -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/libSDKUtil.a -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/libOpenCL.so /tmp/ccLZxpay.o(.text+0x90): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x91): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xa2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xd2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x102): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1d0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1d1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1e2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x212): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x242): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x2f0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x2f1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x302): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x3d2): In function `main': : undefined reference to `streamsdk::SDKFile::open(char const*)' /tmp/ccLZxpay.o(.text+0x400): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x401): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x412): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x4d2): In function `main': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::data() const' /tmp/ccLZxpay.o(.text+0x522): In function `main': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::size() const' /tmp/ccLZxpay.o(.text+0x652): In function `main': : undefined reference to `operator new(unsigned long)' /tmp/ccLZxpay.o(.text+0x742): In function `main': : undefined reference to `operator delete(void*)' /tmp/ccLZxpay.o(.text+0x7a0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x7a1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x7b2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x7e2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x812): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x950): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x951): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x962): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x992): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x9c2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xad0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xad1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xae2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xb12): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0xb42): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xc30): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xc31): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xc42): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xc72): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0xca2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xe00): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xe01): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xe12): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xe42): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0xe72): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1080): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1081): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1092): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x10c2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x10f2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1240): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1241): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1252): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1282): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x12b2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1342): In function `main': : undefined reference to `operator delete(void*)' /tmp/ccLZxpay.o(.text+0x17a2): In function `__static_initialization_and_destruction_0(int, int)': : undefined reference to `std::ios_base::Init::Init[in-charge]()' /tmp/ccLZxpay.o(.text+0x18d2): In function `__tcf_1': : undefined reference to `std::ios_base::Init::~Init [in-charge]()' /tmp/ccLZxpay.o(.IA_64.unwind_info+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNK2cl7Context7getInfoILi4225EEENS_6detail12param_traitsINS2_15cl_context_infoEXT_EE10param_typeEPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorIN2cl6DeviceESaIS1_EED1Ev+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorISt4pairIPKcmESaIS3_EEC1EmRKS3_RKS4_+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorISt4pairIPKcmESaIS3_EED1Ev+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl7Context7getInfoISt6vectorINS_6DeviceESaIS3_EEEEijPT_+0x50): In function `int cl::Context::getInfo<std::vector<cl::Device, std::allocator<cl::Device> > >(unsigned, std::vector<cl::Device, std::allocator<cl::Device> >*) const': : undefined reference to `clGetContextInfo' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl7ContextC1EmPlPFvPKcPKvmPvES6_Pi+0xf2): In function `cl::Context::Context[in-charge](unsigned long, long*, void (*)(char const*, void const*, unsigned long, void*), void*, int*)': : undefined reference to `clCreateContextFromType' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl7ContextC1EmPlPFvPKcPKvmPvES6_Pi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl7ProgramC1ERKNS_7ContextERKSt6vectorISt4pairIPKcmESaIS8_EEPi+0x332): In function `cl::Program::Program[in-charge](cl::Context const&, std::vector<std::pair<char const*, unsigned long>, std::allocator<std::pair<char const*, unsigned long> > > const&, int*)': : undefined reference to `clCreateProgramWithSource' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl7ProgramC1ERKNS_7ContextERKSt6vectorISt4pairIPKcmESaIS8_EEPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl7Program5buildERKSt6vectorINS_6DeviceESaIS2_EEPKcPFvP11_cl_programPvESB_+0xe2): In function `cl::Program::build(std::vector<cl::Device, std::allocator<cl::Device> > const&, char const*, void (*)(_cl_program*, void*), void*) const': : undefined reference to `clBuildProgram' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6KernelC1ERKNS_7ProgramEPKcPi+0xd2): In function `cl::Kernel::Kernel[in-charge](cl::Program const&, char const*, int*)': : undefined reference to `clCreateKernel' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl6KernelC1ERKNS_7ProgramEPKcPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl12CommandQueueC1ERKNS_7ContextERKNS_6DeviceEmPi+0xf2): In function `cl::CommandQueue::CommandQueue[in-charge](cl::Context const&, cl::Device const&, unsigned long, int*)': : undefined reference to `clCreateCommandQueue' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl12CommandQueueC1ERKNS_7ContextERKNS_6DeviceEmPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl12CommandQueue20enqueueNDRangeKernelERKNS_6KernelERKNS_7NDRangeES6_S6_PKSt6vectorINS_5EventESaIS8_EEPS8_+0x392): In function `cl::CommandQueue::enqueueNDRangeKernel(cl::Kernel const&, cl::NDRange const&, cl::NDRange const&, cl::NDRange const&, std::vector<cl::Event, std::allocator<cl::Event> > const*, cl::Event*) const': : undefined reference to `clEnqueueNDRangeKernel' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl12CommandQueue6finishEv+0x32): In function `cl::CommandQueue::finish() const': : undefined reference to `clFinish' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0x32): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::allocator<char>::allocator[in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0x62): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string[in-charge](char const*, std::allocator<char> const&)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0x92): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::allocator<char>::~allocator [in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0xd2): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::allocator<char>::~allocator [in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN9streamsdk7SDKFileC1Ev+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileD1Ev+0x22): In function `streamsdk::SDKFile::~SDKFile [in-charge]()': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string [in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocISt4pairIPKcmESt24__default_alloc_templateILb1ELi0EEE8allocateEm+0x72): In function `std::__simple_alloc<std::pair<char const*, unsigned long>, std::__default_alloc_template<(bool)1, (int)0> >::allocate(unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::allocate(unsigned long)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocISt4pairIPKcmESt24__default_alloc_templateILb1ELi0EEE10deallocateEPS3_m+0x82): In function `std::__simple_alloc<std::pair<char const*, unsigned long>, std::__default_alloc_template<(bool)1, (int)0> >::deallocate(std::pair<char const*, unsigned long>*, unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::deallocate(void*, unsigned long)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP11_cl_contextE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_context*>::release(_cl_context*)': : undefined reference to `clReleaseContext' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP17_cl_command_queueE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_command_queue*>::release(_cl_command_queue*)': : undefined reference to `clReleaseCommandQueue' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP11_cl_programE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_program*>::release(_cl_program*)': : undefined reference to `clReleaseProgram' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP10_cl_kernelE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_kernel*>::release(_cl_kernel*)': : undefined reference to `clReleaseKernel' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocIN2cl6DeviceESt24__default_alloc_templateILb1ELi0EEE10deallocateEPS1_m+0x82): In function `std::__simple_alloc<cl::Device, std::__default_alloc_template<(bool)1, (int)0> >::deallocate(cl::Device*, unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::deallocate(void*, unsigned long)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0xe2): In function `cl::Device* std::vector<cl::Device, std::allocator<cl::Device> >::_M_allocate_and_copy<cl::Device*>(unsigned long, cl::Device*, cl::Device*)': : undefined reference to `__cxa_begin_catch' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0x132): In function `cl::Device* std::vector<cl::Device, std::allocator<cl::Device> >::_M_allocate_and_copy<cl::Device*>(unsigned long, cl::Device*, cl::Device*)': : undefined reference to `__cxa_rethrow' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0x142): In function `cl::Device* std::vector<cl::Device, std::allocator<cl::Device> >::_M_allocate_and_copy<cl::Device*>(unsigned long, cl::Device*, cl::Device*)': : undefined reference to `__cxa_end_catch' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocIN2cl6DeviceESt24__default_alloc_templateILb1ELi0EEE8allocateEm+0x72): In function `std::__simple_alloc<cl::Device, std::__default_alloc_template<(bool)1, (int)0> >::allocate(unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::allocate(unsigned long)' collect2: ld returned 1 exit status

0 Likes

The error you are getting is due to the incorrect way of giving linker options,

There are makefiles provided with samples for easy compilation. In case you want to use commandline, the correct way is

gcc HelloCL.cpp -I ./ -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/ -lGLEW -lSDKUtil -lOpenCL

0 Likes

thank you for quick response.

I tried the same. but invain.

 

even tried using make file, it reported this error

/ati-stream-sdk-v2.0-beta2-lnx64/samples/opencl/cpp_cl/app/HelloCL> make
../../../../../make/openclsdkdefs.mk:51: *** Unknown CPU.  Stop.

0 Likes

What is the error you are getting in the changed commandline?

Which CPU and OS are you using?

Could you provide the configuration of the system in which you are trying to run the sample?

0 Likes

I am running on SuSe Linux 2.6.5-7.267-sn2

suse linux enterprise server 9

version = 9

patch level = 3

Processor - ia64

No of processors = 14

0 Likes

could you run uname -a command and send details displayed on command window?

0 Likes

uname -a

 

Linux ADNPS2 2.6.5-7.267-sn2 #1 SMP Wed Jun 21 10:50:51 UTC 2006 ia64 ia64 ia64 GNU/Linux

0 Likes

The current SDK has support for x86 processors only, which explains why you are getting 'Unknown CPU' error during make.

0 Likes

Thank you very much sir.

I tried on x86 machine and it is successfull running your executable.

when compiling make file it prompts the error

 

mkdir -p ../../../../../samples/opencl/bin/debug/x86/ DCT.o DCT Building ../../../../../samples/opencl/bin/debug/x86/DCT.o g++ -m32 -msse2 -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -I ../../../../../include -o ../../../../../samples/opencl/bin/debug/x86/DCT.o -c DCT.cpp make: g++: Command not found make: *** [DCT.o] Error 127

0 Likes

Seems you dont have g++ installed in your system

You need to have gcc version 4.3 or later installed in your system to sucessfully compile the samples coming with OpenCL sdk.

0 Likes

Great!

Thank you for your kind response.

I got g++ 4.3 installed and its working fine.

 

with regards

   usha regadi

0 Likes

Great!

Thank You for your kind response.

It's working fine after installing g++4.3.

 

0 Likes

Hello sir,

           when i running a kernel I get this error.

Error: clEnqueueNDRangekernel failed

Can You help me out.

Through .cpp I am calling the kernel

 

0 Likes

Could you please post your source code? 

0 Likes

the above issue is solved.

 

but when i am passing the data by reference in a function from kernel.

I made changes to the DCT code given by you.

it is promting for the CLBuildProgram  and clSetArg  failed. I am attaching the code.

DCT.cpp ///////////////////////////////////////////////////////////////////////////////////////////////////// #include "DCT.hpp" using namespace amd::DCT; int DCT::setupDCT() { FILE *fp; char Imgfile[100]; cl_uint inputSizeBytes; strcpy(Imgfile,"/home/usha/C2A_7126_8bit.Img"); /* allocate and init memory used by host */ inputSizeBytes = width * height * sizeof(cl_uchar); input = (cl_uchar *) malloc(inputSizeBytes); if(input==NULL) { sampleCommon->error("Failed to allocate host memory. (input)"); return 0; } cl_uint outputSizeBytes = width * height * sizeof(cl_uchar); output = (cl_uchar *)malloc(outputSizeBytes); if(output==NULL) { sampleCommon->error("Failed to allocate host memory. (output)"); return 0; } /* random initialisation of input */ /* Read the data to input buffer from a memory mapped file */ fp = fopen(Imgfile,"rb"); fseek(fp,2048,SEEK_SET); fread(&input[0],sizeof(cl_uchar),width*height,fp); fclose(fp); return 1; } int DCT::setupCL(void) { cl_int status = 0; size_t deviceListSize; context = clCreateContextFromType( 0, CL_DEVICE_TYPE_CPU, NULL, NULL, &status); if(!sampleCommon->checkVal(status, CL_SUCCESS, "clCreateContextFromType failed.")) return 0; /* First, get the size of device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, 0, NULL, &deviceListSize); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetContextInfo failed.")) return 0; /* Now allocate memory for device list based on the size we got earlier */ devices = (cl_device_id *)malloc(deviceListSize); if(devices==NULL) { sampleCommon->error("Failed to allocate memory (devices)."); return 0; } /* Now, get the device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, deviceListSize, devices, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetGetContextInfo failed.")) return 0; { /* The block is to move the declaration of prop closer to its use */ cl_command_queue_properties prop = 0; if(timing) prop |= CL_QUEUE_PROFILING_ENABLE; commandQueue = clCreateCommandQueue( context, devices[0], prop, &status); if(!sampleCommon->checkVal( status, 0, "clCreateCommandQueue failed.")) return 0; } inputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uchar) * width * height, input, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateBuffer failed. (inputBuffer)")) return 0; outputBuffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uchar) * width * height, output, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateBuffer failed. (outputBuffer)")) return 0; /* create a CL program using the kernel source */ streamsdk::SDKFile kernelFile; kernelFile.open("DCT_Kernels.cl"); const char * source = kernelFile.source().c_str(); size_t sourceSize[] = { strlen(source) }; program = clCreateProgramWithSource( context, 1, &source, sourceSize, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateProgramWithSource failed.")) return 0; /* create a cl program executable for all the devices specified */ status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clBuildProgram failed.")) return 0; /* get a kernel object handle for a kernel with the given name */ kernel = clCreateKernel(program, "DCT", &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateKernel failed.")) return 0; return 1; } int DCT::runCLKernels(void) { cl_int status; cl_event events[2]; size_t globalThreads[2] = {width, height}; size_t localThreads[2] = {blockWidth, blockWidth}; long long kernelsStartTime; long long kernelsEndTime; /*** Set appropriate arguments to the kernel ***/ /** * 1st argument to the kernel which stores the output of the DCT */ status = clSetKernelArg( kernel, 0, sizeof(cl_mem), (void *)&outputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (outputBuffer)")) return 0; /** * 2nd argument to the kernel , the input matrix */ status = clSetKernelArg( kernel, 1, sizeof(cl_mem), (void *)&inputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (inputBuffer)")) return 0; /** /** * 3rd argument to the kernel , local memory which stores intermediate values */ status = clSetKernelArg( kernel, 2, width * height * sizeof(cl_uchar), NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (Intermediate local buffer)")) return 0; /** * 4th argument to the kernel , width of the input image */ status = clSetKernelArg( kernel, 3, sizeof(cl_uint), (void *)&width); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (width)")) return 0; /** * 5th argument to the kernel , heigth of the input image */ status = clSetKernelArg( kernel, 4, sizeof(cl_uint), (void *)&height); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (ImageHeight)")) return 0; /** * 6th argument to the kernel , heigth of the input image */ status = clSetKernelArg( kernel, 5, sizeof(cl_uint), (void *)&blockWidth); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (blockWidth)")) return 0; /** * Enqueue a kernel run call. */ status = clEnqueueNDRangeKernel( commandQueue, kernel, 2, NULL, globalThreads, localThreads, 0, NULL, &events[0]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueNDRangeKernel failed.")) return 0; /** * wait for the kernel call to finish execution */ status = clWaitForEvents(1, &events[0]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return 0; clReleaseEvent(events[0]); if(timing) { status = clGetEventProfilingInfo( events[0], CL_PROFILING_COMMAND_START, sizeof(long long), &kernelsStartTime, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetEventProfilingInfo failed.")) return 0; } if(timing) { status = clGetEventProfilingInfo( events[0], CL_PROFILING_COMMAND_END, sizeof(long long), &kernelsEndTime, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetEventProfilingInfo failed.")) return 0; /* Compute total time (also convert from nanoseconds to seconds) */ totalTime = (double)(kernelsEndTime - kernelsStartTime)/1e9; } /* Enqueue readBuffer*/ status = clEnqueueReadBuffer( commandQueue, outputBuffer, CL_TRUE, 0, width * height * sizeof(cl_uchar), output, 0, NULL, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueReadBuffer failed.")) return 0; /* Wait for the read buffer to finish execution */ status = clWaitForEvents(1, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return 0; clReleaseEvent(events[1]); for(int indx=0;indx<height;indx++) for(int indy=0;indy<width;indy++) printf(" %d \t ",output[indx*width+indy]); return 1; } #if 0 cl_uint DCT::getIdx(cl_uint blockIdx, cl_uint blockIdy, cl_uint localIdx, cl_uint localIdy, cl_uint blockWidth, cl_uint globalWidth) { cl_uint globalIdx = blockIdx * blockWidth + localIdx; cl_uint globalIdy = blockIdy * blockWidth + localIdy; return (globalIdy * globalWidth + globalIdx); } #endif cl_uint DCT::getIdx(cl_uchar *buffer, cl_uint blockIdx, cl_uint blockIdy, cl_uint localIdx, cl_uint localIdy, cl_uint blockWidth, cl_uint globalWidth) { cl_uint globalIdx = blockIdx * blockWidth + localIdx; cl_uint globalIdy = blockIdy * blockWidth + localIdy; return (globalIdy * globalWidth + globalIdx); } int DCT::initialize() { // Call base class Initialize to get default configuration if(!this->SDKSample::initialize()) return 0; return 1; } int DCT::setup() { /* Make sure the width is a multiple of blockWidth 8 here */ if(width%blockWidth != 0) width = (width/blockWidth + 1)*blockWidth; /* Make sure the height is a multiple of blockWidth 8 here */ if(height%blockWidth !=0) height = (height/blockWidth + 1)*blockWidth; if(!setupDCT()) return 0; if(!setupCL()) return 0; return 1; } int DCT::run() { /* Arguments are set and execution call is enqueued on command buffer */ if(!runCLKernels()) return 0; if(!quiet) { sampleCommon->printArray<cl_uchar>("Output", output, width,1); } return 1; } int DCT::verifyResults() { return -1; } void DCT::printStats() { this->SDKSample::printStats(); } int DCT::cleanup() { /* Releases OpenCL resources (Context, Memory etc.) */ cl_int status; status = clReleaseKernel(kernel); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseKernel failed.")) return 0; status = clReleaseProgram(program); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseProgram failed.")) return 0; status = clReleaseMemObject(inputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseMemObject failed.")) return 0; status = clReleaseMemObject(outputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseMemObject failed.")) return 0; status = clReleaseCommandQueue(commandQueue); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseCommandQueue failed.")) return 0; status = clReleaseContext(context); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseContext failed.")) return 0; /* release program resources (input memory etc.) */ if(input) free(input); if(output) free(output); //if(verificationOutput) //free(verificationOutput); if(devices) free(devices); return 1; } int main(int argc, char * argv[]) { DCT clDCT("OpenCL DCT8x8"); clDCT.initialize(); if(!clDCT.parseCommandLine(argc, argv)) return 0; clDCT.setup(); clDCT.run(); clDCT.verifyResults(); clDCT.cleanup(); clDCT.printStats(); return 0; } /////////////////////// End of DCT.cpp///////////////////////// /////////////////////////////////////////// DCT.hpp ////////////////////////////////////////////////////////////////////////////// #ifndef DCT_H_ #define DCT_H_ #include <CL/cl.h> #include <stdio.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <SDKUtil/SDKCommon.hpp> #include <SDKUtil/SDKApplication.hpp> #include <SDKUtil/SDKCommandArgs.hpp> #include <SDKUtil/SDKFile.hpp> #if !defined(M_PI) #define M_PI (3.14159265358979323846f) #endif namespace amd { namespace DCT { const cl_float a = cos(M_PI/16)/2; const cl_float b = cos(M_PI/8 )/2; const cl_float c = cos(3*M_PI/16)/2; const cl_float d = cos(5*M_PI/16)/2; const cl_float e = cos(3*M_PI/8)/2; const cl_float f = cos(7*M_PI/16)/2; const cl_float g = 1.0f/sqrt(8.0f); /** * DCT8x8 mask that is used to calculate Discrete Cosine Transform * of an 8x8 matrix */ cl_float dct8x8[64] = { g, a, b, c, g, d, e, f, g, c, e, -f, -g, -a, -b, -d, g, d, -e, -a, -g, f, b, c, g, f, -b, -d, g, c, -e, -a, g, -f, -b, d, g, -c, -e, a, g, -d, -e, a, -g, -f, b, -c, g, -c, e, f, -g, a, -b, d, g, -a, b, -c, g, -d, e, f }; /** * DCT * Class implements OpenCL Discrete Cosine Transform * Derived from SDKSample base class */ class DCT : public SDKSample { cl_uint seed; /**< Seed value for random number generation */ cl_double totalKernelTime; /**< Time for kernel execution */ cl_double totalProgramTime; /**< Time for program execution */ cl_double referenceKernelTime; /**< Time for reference implementation */ cl_uchar *input; /**< Input array */ cl_uchar *output; /**< Input array */ cl_uint blockWidth; /**< width of the blockSize */ cl_uint blockSize; /**< size fo the block */ cl_bool inverse; /**< flag for inverse DCT */ cl_float *verificationOutput; /**< Input array for reference implementation */ cl_context context; /**< CL context */ cl_device_id *devices; /**< CL device list */ cl_mem inputBuffer; /**< CL memory buffer */ cl_mem outputBuffer; /**< CL memory buffer */ cl_mem dctBuffer; /**< CL memory buffer */ cl_command_queue commandQueue; /**< CL command queue */ cl_program program; /**< CL program */ cl_kernel kernel; /**< CL kernel */ public: /** * Constructor * Initialize member variables * @param name name of sample (string) */ DCT(std::string name) : SDKSample(name) { seed = 123; input = NULL; verificationOutput = NULL; blockWidth = 8; blockSize = blockWidth * blockWidth; inverse = false; } /** * Constructor * Initialize member variables * @param name name of sample (const char*) */ DCT(const char* name) : SDKSample(name) { seed = 123; input = NULL; verificationOutput = NULL; blockWidth = 8; blockSize = blockWidth * blockWidth; inverse = false; } /** * Allocate and initialize host memory array with random values * @return 1 on success and 0 on failure */ int setupDCT(); /** * OpenCL related initialisations. * Set up Context, Device list, Command Queue, Memory buffers * Build CL kernel program executable * @return 1 on success and 0 on failure */ int setupCL(); /** * Set values for kernels' arguments, enqueue calls to the kernels * on to the command queue, wait till end of kernel execution. * Get kernel start and end time if timing is enabled * @return 1 on success and 0 on failure */ int runCLKernels(); /** * Given the blockindices and localIndicies this * function calculate the global index * @param blockIdx index of the block horizontally * @param blockIdy index of the block vertically * @param localidx index of the element relative to the block horizontally * @param localIdy index of the element relative to the block vertically * @param blockWidth width of each blcok which is 8 * @param globalWidth Width of the input matrix */ //cl_uint getIdx(cl_uchar *imgdata, cl_uint blockIdy, cl_uint blockIdx, cl_float stag_value, cl_uint index); cl_uint getIdx(cl_uchar *imgdata, cl_uint blockIdx, cl_uint blockIdy, cl_uint localIdx, cl_uint localIdy, cl_uint blockWidth, cl_uint globalWidth); //uint StaggerCorrection(unsigned char *,int ,int ,float ,int); /** * Reference CPU implementation of Discrete Cosine Transform * for performance comparison * @param output output of the DCT8x8 transform * @param input input array * @param dct8x8 8x8 consine function base used to calculate DCT8x8 * @param width width of the input matrix * @param height height of the input matrix * @param numBlocksX number of blocks horizontally * @param numBlocksY number of blocks vertically * @param inverse flag to perform inverse DCT */ #if 0 void DCTCPUReference( cl_float * output, const cl_float * input , const cl_float * dct8x8 , const cl_uint width, const cl_uint height, const cl_uint numBlocksX, const cl_uint numBlocksY, const cl_bool inverse); #endif /** * Override from SDKSample. Print sample stats. */ void printStats(); /** * Override from SDKSample. Initialize * command line parser, add custom options */ int initialize(); /** * Override from SDKSample, adjust width and height * of execution domain, perform all sample setup */ int setup(); /** * Override from SDKSample * Run OpenCL DCT */ int run(); /** * Override from SDKSample * Cleanup memory allocations */ int cleanup(); /** * Override from SDKSample * Verify against reference implementation */ int verifyResults(); }; } //namespace DCT }//namespace amd #endif /////////////////////////////////// END of DCT.hpp ///////////////////////////////////// ///////////////////////////////////DCT_kernel.cl///////////////////////////////////////////////////////// /** * Given the blockindices and localIndicies this * function calculate the global index * @param blockIdx index of the block horizontally * @param blockIdy index of the block vertically * @param localidx index of the element relative to the block horizontally * @param localIdy index of the element relative to the block vertically * @param blockWidth width of each blcok which is 8 * @param globalWidth Width of the input matrix */ uint getIdx(uchar *imgdata, uint blockIdx, uint blockIdy, uint localIdx, uint localIdy, uint blockWidth, uint globalWidth) { uint globalIdx = blockIdx * blockWidth + localIdx; uint globalIdy = blockIdy * blockWidth + localIdy; return (globalIdy * globalWidth + globalIdx); } /** * Perform Discrete Cosine Transform for block of size 8x8 * in the input matrix * @param output output of the DCT8x8 transform * @param input input array * @param dct8x8 8x8 consine function base used to calculate DCT8x8 * @param inter local memory which stores intermediate result * @param width width of the input matrix * @param blockWidth width of each block, 8 here * @param inverse flag to perform inverse DCT */ __kernel void DCT(__global uchar * output, __global uchar * input, __local uchar * inter, const uint width, const uint height, const uint blockWidth) { /* get global indices of the element */ uint globalIdx = get_global_id(0); uint globalIdy = get_global_id(1); /* get indices of the block to which the element belongs to */ uint groupIdx = get_group_id(0); uint groupIdy = get_group_id(1); /* get indices relative to the block */ uint i = get_local_id(0); uint j = get_local_id(1); uint idx = globalIdy * width + globalIdx; /* initialise the accumulator */ float acc = 0.0f; for(uint m=0; m < height; m++) for(uint k=0; k < blockWidth; k++) inter[m*blockWidth+k] = input[m*blockWidth+k]; for(uint k=0; k < blockWidth; k++) { getIdx(&inter[0], groupIdx, groupIdy, j, k, blockWidth, width); } /* AT * X */ #if 0 for(uint k=0; k < blockWidth; k++) { uint index1 = (inverse)? i*blockWidth + k : k * blockWidth + i; uint index2 = getIdx(groupIdx, groupIdy, j, k, blockWidth, width); acc += input[index2]; } inter[idx] = acc; #endif for(uint j=0; j < groupIdy; j++) for(uint k=0; k < groupIdx; k++) output[j*groupIdx+k] = input[j*groupIdx+k]; /* * Make sure all the values of inter that belong to a block * are calculated before proceeding further */ barrier(CLK_LOCAL_MEM_FENCE); /* again initalising the accumulator */ acc = 0.0f; /* (AT * X) * A */ for(uint k=0; k < blockWidth; k++) { //uint index1 = getIdx(groupIdx, groupIdy, k, i, blockWidth, width); //uint index2 = (inverse)? j*blockWidth + k : k* blockWidth + j; //acc += inter[index1] * dct8x8[index2]; } output[idx] = acc; }

0 Likes

In case of clBuildProgram Failures, you should check the build log, here is a sample code to show the build program log.

 

 

 

 

 

/* create a cl program executable for all the devices specified */ status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); //error checking code if(!sampleCommon->checkVal(status,CL_SUCCESS,"clBuildProgram failed.")) { //print kernel compilation error char programLog[1024]; status = clGetProgramBuildInfo(program, devices[0], CL_PROGRAM_BUILD_LOG, 1024, programLog, 0); std::cout<<programLog<<std::endl; return 0; }

0 Likes

i added the code given by you.

On compiling it has given this error

error: stray'\240\ in program

f(!sampleCommon->checkVal(status,CL_SUCCESS,"clBuildProgram failed.")) { //print kernel compilation error char programLog[1024]; size_t len; status = clGetProgramBuildInfo(program,devices[0],CL_PROGRAM_BUILD_LOG,1024,programLog,0); return 0; }

0 Likes

your kernel is in utf8? convert it to one byte code page.

EDIT: i have non-us keyboard. when i press right alt + space it write space. but when i compile code with gcc then i get error of stray \304 and \204

0 Likes

Stray characters mean illegal (encoding of) characters that might be used in the text you are copy-pasting from.

Try writing the code instead of copy-paste directly.

0 Likes

hi,

while executing the MatrixMultiplication it is prompting an error

Expired by 30 sep 2009 00.00.00

could you explain me

 

Regards

0 Likes

beta2 expired today.

Install beta3

 

see at following URL for more details.

http://forums.amd.com/forum/messageview.cfm?catid=328&threadid=119330&forumid=9

0 Likes

uninstalled ATI beta2 and other related ATI installation .

 

still not able to run beta3

error: clBuild error

expired by 30 sep 2009

regards

0 Likes

Make sure that the system PATH variable points to correct .dlls and the application is not getting old libraries at runtime.

0 Likes

The restrictions you mention are no longer present and there is a disconnect in the release notes. __local within a kernel is supported as are arrays within structs. We are working on getting the release notes updated to reflect this.

There are actually a number of offer key differences between Brook and OpenCL but I think two of the most important ones are:

     - OpenCL supports a full scatter/gather mode

     - OpenCL is intended for CPU and GPU

0 Likes

Great!
Very nice to read something what will give more optimism
With OpenCL supported it would be possible to run app on different GPUs (nVidia and ATI), very good advantage too.
0 Likes

Actually, local arrays in brook+ would correspond to __private memory space in openCL, not __local.  __ local space allows you to share data between different work items inside a work group, more like LDS in brook+.

0 Likes