32 Replies Latest reply on Oct 5, 2009 7:10 AM by omkaranathan

    Open source OpenCL for CPU

    jackpien

      Just downloaded the new ATI Stream SDK v2 beta and playing with the OpenCL samples.  Great job AMD!

      Any chance AMD will be open sourcing the OpenCL implementation for the CPU?  It would be a very productive step to furthering AMD/ATI's stream compute efforts by leveraging the open source community to further the OpenCL development.

      Thanks

      Jack

       

        • Open source OpenCL for CPU
          riza.guntur

          Thanks for the info, the approval email has just arrived when the time you posted. Download in progress.

            • Open source OpenCL for CPU
              Raistmer
              LoL, same restrictions as with Brook+. Are you sure we need to move to OpenCL then? ;)
              "
              . Currently, it is not possible to define local memory variables inside OpenCL-C functions or
              kernels. For example:
              __kernel void example(__global float4* ptr)
              {
              __local float4 temp[256];
              ...
              }
              . Currently, arrays are not supported inside structs.
              "
                • Open source OpenCL for CPU
                  riza.guntur

                   

                  Originally posted by: Raistmer LoL, same restrictions as with Brook+. Are you sure we need to move to OpenCL then? " . Currently, it is not possible to define local memory variables inside OpenCL-C functions or kernels. For example: __kernel void example(__global float4* ptr) { __local float4 temp[256]; ... } . Currently, arrays are not supported inside structs. "


                  Nope.

                  I could say: Brook+ is easier. I think if Brook+ could have multithread CPU backend, it will be cool.

                  Do we need TWO languages pointing to same direction and same usage?

                    • Open source OpenCL for CPU
                      sorcerer

                       

                      Originally posted by: riza.guntur I think if Brook+ could have multithread CPU backend, it will be cool.

                       

                       

                      True!

                      The original BrookGPU (v0.5) already has an OpenMP backend.

                        • Open source OpenCL for CPU
                          ryta1203

                          Is this OpenCL release built on top of CAL/IL much like Brook+??

                            • Open source OpenCL for CPU
                              michael.chu

                              The CPU component isn't but the GPU component will be (built on top of CAL).

                                • Open source OpenCL for CPU
                                  usha.regadi

                                  I have a problem compiling Hellocl.cpp program given in samples on 64-bit linux machine.

                                   I am attaching the error file.

                                  gcc HelloCL.cpp -I ./ -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/libGLEW.a -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/libSDKUtil.a -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/libOpenCL.so /tmp/ccLZxpay.o(.text+0x90): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x91): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xa2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xd2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x102): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1d0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1d1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1e2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x212): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x242): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x2f0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x2f1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x302): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x3d2): In function `main': : undefined reference to `streamsdk::SDKFile::open(char const*)' /tmp/ccLZxpay.o(.text+0x400): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x401): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x412): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x4d2): In function `main': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::data() const' /tmp/ccLZxpay.o(.text+0x522): In function `main': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::size() const' /tmp/ccLZxpay.o(.text+0x652): In function `main': : undefined reference to `operator new(unsigned long)' /tmp/ccLZxpay.o(.text+0x742): In function `main': : undefined reference to `operator delete(void*)' /tmp/ccLZxpay.o(.text+0x7a0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x7a1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x7b2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x7e2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x812): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x950): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x951): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x962): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x992): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x9c2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xad0): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xad1): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xae2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xb12): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0xb42): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xc30): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xc31): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xc42): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xc72): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0xca2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xe00): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xe01): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0xe12): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0xe42): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0xe72): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1080): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1081): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1092): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x10c2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x10f2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1240): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1241): In function `main': : undefined reference to `std::cerr' /tmp/ccLZxpay.o(.text+0x1252): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1282): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >::operator<<(int)' /tmp/ccLZxpay.o(.text+0x12b2): In function `main': : undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)' /tmp/ccLZxpay.o(.text+0x1342): In function `main': : undefined reference to `operator delete(void*)' /tmp/ccLZxpay.o(.text+0x17a2): In function `__static_initialization_and_destruction_0(int, int)': : undefined reference to `std::ios_base::Init::Init[in-charge]()' /tmp/ccLZxpay.o(.text+0x18d2): In function `__tcf_1': : undefined reference to `std::ios_base::Init::~Init [in-charge]()' /tmp/ccLZxpay.o(.IA_64.unwind_info+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNK2cl7Context7getInfoILi4225EEENS_6detail12param_traitsINS2_15cl_context_infoEXT_EE10param_typeEPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorIN2cl6DeviceESaIS1_EED1Ev+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorISt4pairIPKcmESaIS3_EEC1EmRKS3_RKS4_+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorISt4pairIPKcmESaIS3_EED1Ev+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl7Context7getInfoISt6vectorINS_6DeviceESaIS3_EEEEijPT_+0x50): In function `int cl::Context::getInfo<std::vector<cl::Device, std::allocator<cl::Device> > >(unsigned, std::vector<cl::Device, std::allocator<cl::Device> >*) const': : undefined reference to `clGetContextInfo' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl7ContextC1EmPlPFvPKcPKvmPvES6_Pi+0xf2): In function `cl::Context::Context[in-charge](unsigned long, long*, void (*)(char const*, void const*, unsigned long, void*), void*, int*)': : undefined reference to `clCreateContextFromType' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl7ContextC1EmPlPFvPKcPKvmPvES6_Pi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl7ProgramC1ERKNS_7ContextERKSt6vectorISt4pairIPKcmESaIS8_EEPi+0x332): In function `cl::Program::Program[in-charge](cl::Context const&, std::vector<std::pair<char const*, unsigned long>, std::allocator<std::pair<char const*, unsigned long> > > const&, int*)': : undefined reference to `clCreateProgramWithSource' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl7ProgramC1ERKNS_7ContextERKSt6vectorISt4pairIPKcmESaIS8_EEPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl7Program5buildERKSt6vectorINS_6DeviceESaIS2_EEPKcPFvP11_cl_programPvESB_+0xe2): In function `cl::Program::build(std::vector<cl::Device, std::allocator<cl::Device> > const&, char const*, void (*)(_cl_program*, void*), void*) const': : undefined reference to `clBuildProgram' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6KernelC1ERKNS_7ProgramEPKcPi+0xd2): In function `cl::Kernel::Kernel[in-charge](cl::Program const&, char const*, int*)': : undefined reference to `clCreateKernel' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl6KernelC1ERKNS_7ProgramEPKcPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl12CommandQueueC1ERKNS_7ContextERKNS_6DeviceEmPi+0xf2): In function `cl::CommandQueue::CommandQueue[in-charge](cl::Context const&, cl::Device const&, unsigned long, int*)': : undefined reference to `clCreateCommandQueue' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN2cl12CommandQueueC1ERKNS_7ContextERKNS_6DeviceEmPi+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl12CommandQueue20enqueueNDRangeKernelERKNS_6KernelERKNS_7NDRangeES6_S6_PKSt6vectorINS_5EventESaIS8_EEPS8_+0x392): In function `cl::CommandQueue::enqueueNDRangeKernel(cl::Kernel const&, cl::NDRange const&, cl::NDRange const&, cl::NDRange const&, std::vector<cl::Event, std::allocator<cl::Event> > const*, cl::Event*) const': : undefined reference to `clEnqueueNDRangeKernel' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNK2cl12CommandQueue6finishEv+0x32): In function `cl::CommandQueue::finish() const': : undefined reference to `clFinish' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0x32): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::allocator<char>::allocator[in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0x62): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string[in-charge](char const*, std::allocator<char> const&)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0x92): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::allocator<char>::~allocator [in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileC1Ev+0xd2): In function `streamsdk::SDKFile::SDKFile[in-charge]()': : undefined reference to `std::allocator<char>::~allocator [in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZN9streamsdk7SDKFileC1Ev+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN9streamsdk7SDKFileD1Ev+0x22): In function `streamsdk::SDKFile::~SDKFile [in-charge]()': : undefined reference to `std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string [in-charge]()' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocISt4pairIPKcmESt24__default_alloc_templateILb1ELi0EEE8allocateEm+0x72): In function `std::__simple_alloc<std::pair<char const*, unsigned long>, std::__default_alloc_template<(bool)1, (int)0> >::allocate(unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::allocate(unsigned long)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocISt4pairIPKcmESt24__default_alloc_templateILb1ELi0EEE10deallocateEPS3_m+0x82): In function `std::__simple_alloc<std::pair<char const*, unsigned long>, std::__default_alloc_template<(bool)1, (int)0> >::deallocate(std::pair<char const*, unsigned long>*, unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::deallocate(void*, unsigned long)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP11_cl_contextE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_context*>::release(_cl_context*)': : undefined reference to `clReleaseContext' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP17_cl_command_queueE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_command_queue*>::release(_cl_command_queue*)': : undefined reference to `clReleaseCommandQueue' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP11_cl_programE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_program*>::release(_cl_program*)': : undefined reference to `clReleaseProgram' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZN2cl6detail16ReferenceHandlerIP10_cl_kernelE7releaseES3_+0x22): In function `cl::detail::ReferenceHandler<_cl_kernel*>::release(_cl_kernel*)': : undefined reference to `clReleaseKernel' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocIN2cl6DeviceESt24__default_alloc_templateILb1ELi0EEE10deallocateEPS1_m+0x82): In function `std::__simple_alloc<cl::Device, std::__default_alloc_template<(bool)1, (int)0> >::deallocate(cl::Device*, unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::deallocate(void*, unsigned long)' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0xe2): In function `cl::Device* std::vector<cl::Device, std::allocator<cl::Device> >::_M_allocate_and_copy<cl::Device*>(unsigned long, cl::Device*, cl::Device*)': : undefined reference to `__cxa_begin_catch' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0x132): In function `cl::Device* std::vector<cl::Device, std::allocator<cl::Device> >::_M_allocate_and_copy<cl::Device*>(unsigned long, cl::Device*, cl::Device*)': : undefined reference to `__cxa_rethrow' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0x142): In function `cl::Device* std::vector<cl::Device, std::allocator<cl::Device> >::_M_allocate_and_copy<cl::Device*>(unsigned long, cl::Device*, cl::Device*)': : undefined reference to `__cxa_end_catch' /tmp/ccLZxpay.o(.gnu.linkonce.ia64unwi._ZNSt6vectorIN2cl6DeviceESaIS1_EE20_M_allocate_and_copyIPS1_EES5_mT_S6_+0x18): undefined reference to `__gxx_personality_v0' /tmp/ccLZxpay.o(.gnu.linkonce.t._ZNSt14__simple_allocIN2cl6DeviceESt24__default_alloc_templateILb1ELi0EEE8allocateEm+0x72): In function `std::__simple_alloc<cl::Device, std::__default_alloc_template<(bool)1, (int)0> >::allocate(unsigned long)': : undefined reference to `std::__default_alloc_template<(bool)1, (int)0>::allocate(unsigned long)' collect2: ld returned 1 exit status

                                    • Open source OpenCL for CPU
                                      omkaranathan

                                      The error you are getting is due to the incorrect way of giving linker options,

                                      There are makefiles provided with samples for easy compilation. In case you want to use commandline, the correct way is

                                      gcc HelloCL.cpp -I ./ -L /home/mahendra/ati-stream-sdk-v2.0-beta2-lnx64/lib/x86_64/ -lGLEW -lSDKUtil -lOpenCL

                                        • Open source OpenCL for CPU
                                          usha.regadi

                                          thank you for quick response.

                                          I tried the same. but invain.

                                           

                                          even tried using make file, it reported this error

                                          /ati-stream-sdk-v2.0-beta2-lnx64/samples/opencl/cpp_cl/app/HelloCL> make
                                          ../../../../../make/openclsdkdefs.mk:51: *** Unknown CPU.  Stop.

                                            • Open source OpenCL for CPU
                                              omkaranathan

                                              What is the error you are getting in the changed commandline?

                                              Which CPU and OS are you using?

                                              Could you provide the configuration of the system in which you are trying to run the sample?

                                                • Open source OpenCL for CPU
                                                  usha.regadi

                                                  I am running on SuSe Linux 2.6.5-7.267-sn2

                                                  suse linux enterprise server 9

                                                  version = 9

                                                  patch level = 3

                                                  Processor - ia64

                                                  No of processors = 14

                                                    • Open source OpenCL for CPU
                                                      genaganna

                                                      could you run uname -a command and send details displayed on command window?

                                                        • Open source OpenCL for CPU
                                                          usha.regadi

                                                          uname -a

                                                           

                                                          Linux ADNPS2 2.6.5-7.267-sn2 #1 SMP Wed Jun 21 10:50:51 UTC 2006 ia64 ia64 ia64 GNU/Linux

                                                            • Open source OpenCL for CPU
                                                              omkaranathan

                                                              The current SDK has support for x86 processors only, which explains why you are getting 'Unknown CPU' error during make.

                                                                • Open source OpenCL for CPU
                                                                  usha.regadi

                                                                  Thank you very much sir.

                                                                  I tried on x86 machine and it is successfull running your executable.

                                                                  when compiling make file it prompts the error

                                                                   

                                                                  mkdir -p ../../../../../samples/opencl/bin/debug/x86/ DCT.o DCT Building ../../../../../samples/opencl/bin/debug/x86/DCT.o g++ -m32 -msse2 -Wfloat-equal -Wpointer-arith -g3 -ffor-scope -I ../../../../../include -o ../../../../../samples/opencl/bin/debug/x86/DCT.o -c DCT.cpp make: g++: Command not found make: *** [DCT.o] Error 127

                                                                    • Open source OpenCL for CPU
                                                                      omkaranathan

                                                                      Seems you dont have g++ installed in your system

                                                                      You need to have gcc version 4.3 or later installed in your system to sucessfully compile the samples coming with OpenCL sdk.

                                                                        • Open source OpenCL for CPU
                                                                          usha.regadi

                                                                          Great!

                                                                          Thank you for your kind response.

                                                                          I got g++ 4.3 installed and its working fine.

                                                                           

                                                                          with regards

                                                                             usha regadi

                                                                          • Open source OpenCL for CPU
                                                                            usha.regadi

                                                                            Great!

                                                                            Thank You for your kind response.

                                                                            It's working fine after installing g++4.3.

                                                                             

                                                                              • Open source OpenCL for CPU
                                                                                usha.regadi

                                                                                Hello sir,

                                                                                           when i running a kernel I get this error.

                                                                                Error: clEnqueueNDRangekernel failed

                                                                                Can You help me out.

                                                                                Through .cpp I am calling the kernel

                                                                                 

                                                                                  • Open source OpenCL for CPU
                                                                                    omkaranathan

                                                                                    Could you please post your source code? 

                                                                                      • Open source OpenCL for CPU
                                                                                        usha.regadi

                                                                                        the above issue is solved.

                                                                                         

                                                                                        but when i am passing the data by reference in a function from kernel.

                                                                                        I made changes to the DCT code given by you.

                                                                                        it is promting for the CLBuildProgram  and clSetArg  failed. I am attaching the code.

                                                                                        DCT.cpp ///////////////////////////////////////////////////////////////////////////////////////////////////// #include "DCT.hpp" using namespace amd::DCT; int DCT::setupDCT() { FILE *fp; char Imgfile[100]; cl_uint inputSizeBytes; strcpy(Imgfile,"/home/usha/C2A_7126_8bit.Img"); /* allocate and init memory used by host */ inputSizeBytes = width * height * sizeof(cl_uchar); input = (cl_uchar *) malloc(inputSizeBytes); if(input==NULL) { sampleCommon->error("Failed to allocate host memory. (input)"); return 0; } cl_uint outputSizeBytes = width * height * sizeof(cl_uchar); output = (cl_uchar *)malloc(outputSizeBytes); if(output==NULL) { sampleCommon->error("Failed to allocate host memory. (output)"); return 0; } /* random initialisation of input */ /* Read the data to input buffer from a memory mapped file */ fp = fopen(Imgfile,"rb"); fseek(fp,2048,SEEK_SET); fread(&input[0],sizeof(cl_uchar),width*height,fp); fclose(fp); return 1; } int DCT::setupCL(void) { cl_int status = 0; size_t deviceListSize; context = clCreateContextFromType( 0, CL_DEVICE_TYPE_CPU, NULL, NULL, &status); if(!sampleCommon->checkVal(status, CL_SUCCESS, "clCreateContextFromType failed.")) return 0; /* First, get the size of device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, 0, NULL, &deviceListSize); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetContextInfo failed.")) return 0; /* Now allocate memory for device list based on the size we got earlier */ devices = (cl_device_id *)malloc(deviceListSize); if(devices==NULL) { sampleCommon->error("Failed to allocate memory (devices)."); return 0; } /* Now, get the device list data */ status = clGetContextInfo( context, CL_CONTEXT_DEVICES, deviceListSize, devices, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetGetContextInfo failed.")) return 0; { /* The block is to move the declaration of prop closer to its use */ cl_command_queue_properties prop = 0; if(timing) prop |= CL_QUEUE_PROFILING_ENABLE; commandQueue = clCreateCommandQueue( context, devices[0], prop, &status); if(!sampleCommon->checkVal( status, 0, "clCreateCommandQueue failed.")) return 0; } inputBuffer = clCreateBuffer( context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uchar) * width * height, input, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateBuffer failed. (inputBuffer)")) return 0; outputBuffer = clCreateBuffer( context, CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR, sizeof(cl_uchar) * width * height, output, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateBuffer failed. (outputBuffer)")) return 0; /* create a CL program using the kernel source */ streamsdk::SDKFile kernelFile; kernelFile.open("DCT_Kernels.cl"); const char * source = kernelFile.source().c_str(); size_t sourceSize[] = { strlen(source) }; program = clCreateProgramWithSource( context, 1, &source, sourceSize, &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateProgramWithSource failed.")) return 0; /* create a cl program executable for all the devices specified */ status = clBuildProgram(program, 1, devices, NULL, NULL, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clBuildProgram failed.")) return 0; /* get a kernel object handle for a kernel with the given name */ kernel = clCreateKernel(program, "DCT", &status); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clCreateKernel failed.")) return 0; return 1; } int DCT::runCLKernels(void) { cl_int status; cl_event events[2]; size_t globalThreads[2] = {width, height}; size_t localThreads[2] = {blockWidth, blockWidth}; long long kernelsStartTime; long long kernelsEndTime; /*** Set appropriate arguments to the kernel ***/ /** * 1st argument to the kernel which stores the output of the DCT */ status = clSetKernelArg( kernel, 0, sizeof(cl_mem), (void *)&outputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (outputBuffer)")) return 0; /** * 2nd argument to the kernel , the input matrix */ status = clSetKernelArg( kernel, 1, sizeof(cl_mem), (void *)&inputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (inputBuffer)")) return 0; /** /** * 3rd argument to the kernel , local memory which stores intermediate values */ status = clSetKernelArg( kernel, 2, width * height * sizeof(cl_uchar), NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (Intermediate local buffer)")) return 0; /** * 4th argument to the kernel , width of the input image */ status = clSetKernelArg( kernel, 3, sizeof(cl_uint), (void *)&width); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (width)")) return 0; /** * 5th argument to the kernel , heigth of the input image */ status = clSetKernelArg( kernel, 4, sizeof(cl_uint), (void *)&height); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (ImageHeight)")) return 0; /** * 6th argument to the kernel , heigth of the input image */ status = clSetKernelArg( kernel, 5, sizeof(cl_uint), (void *)&blockWidth); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clSetKernelArg failed. (blockWidth)")) return 0; /** * Enqueue a kernel run call. */ status = clEnqueueNDRangeKernel( commandQueue, kernel, 2, NULL, globalThreads, localThreads, 0, NULL, &events[0]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueNDRangeKernel failed.")) return 0; /** * wait for the kernel call to finish execution */ status = clWaitForEvents(1, &events[0]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return 0; clReleaseEvent(events[0]); if(timing) { status = clGetEventProfilingInfo( events[0], CL_PROFILING_COMMAND_START, sizeof(long long), &kernelsStartTime, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetEventProfilingInfo failed.")) return 0; } if(timing) { status = clGetEventProfilingInfo( events[0], CL_PROFILING_COMMAND_END, sizeof(long long), &kernelsEndTime, NULL); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clGetEventProfilingInfo failed.")) return 0; /* Compute total time (also convert from nanoseconds to seconds) */ totalTime = (double)(kernelsEndTime - kernelsStartTime)/1e9; } /* Enqueue readBuffer*/ status = clEnqueueReadBuffer( commandQueue, outputBuffer, CL_TRUE, 0, width * height * sizeof(cl_uchar), output, 0, NULL, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clEnqueueReadBuffer failed.")) return 0; /* Wait for the read buffer to finish execution */ status = clWaitForEvents(1, &events[1]); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clWaitForEvents failed.")) return 0; clReleaseEvent(events[1]); for(int indx=0;indx<height;indx++) for(int indy=0;indy<width;indy++) printf(" %d \t ",output[indx*width+indy]); return 1; } #if 0 cl_uint DCT::getIdx(cl_uint blockIdx, cl_uint blockIdy, cl_uint localIdx, cl_uint localIdy, cl_uint blockWidth, cl_uint globalWidth) { cl_uint globalIdx = blockIdx * blockWidth + localIdx; cl_uint globalIdy = blockIdy * blockWidth + localIdy; return (globalIdy * globalWidth + globalIdx); } #endif cl_uint DCT::getIdx(cl_uchar *buffer, cl_uint blockIdx, cl_uint blockIdy, cl_uint localIdx, cl_uint localIdy, cl_uint blockWidth, cl_uint globalWidth) { cl_uint globalIdx = blockIdx * blockWidth + localIdx; cl_uint globalIdy = blockIdy * blockWidth + localIdy; return (globalIdy * globalWidth + globalIdx); } int DCT::initialize() { // Call base class Initialize to get default configuration if(!this->SDKSample::initialize()) return 0; return 1; } int DCT::setup() { /* Make sure the width is a multiple of blockWidth 8 here */ if(width%blockWidth != 0) width = (width/blockWidth + 1)*blockWidth; /* Make sure the height is a multiple of blockWidth 8 here */ if(height%blockWidth !=0) height = (height/blockWidth + 1)*blockWidth; if(!setupDCT()) return 0; if(!setupCL()) return 0; return 1; } int DCT::run() { /* Arguments are set and execution call is enqueued on command buffer */ if(!runCLKernels()) return 0; if(!quiet) { sampleCommon->printArray<cl_uchar>("Output", output, width,1); } return 1; } int DCT::verifyResults() { return -1; } void DCT::printStats() { this->SDKSample::printStats(); } int DCT::cleanup() { /* Releases OpenCL resources (Context, Memory etc.) */ cl_int status; status = clReleaseKernel(kernel); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseKernel failed.")) return 0; status = clReleaseProgram(program); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseProgram failed.")) return 0; status = clReleaseMemObject(inputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseMemObject failed.")) return 0; status = clReleaseMemObject(outputBuffer); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseMemObject failed.")) return 0; status = clReleaseCommandQueue(commandQueue); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseCommandQueue failed.")) return 0; status = clReleaseContext(context); if(!sampleCommon->checkVal( status, CL_SUCCESS, "clReleaseContext failed.")) return 0; /* release program resources (input memory etc.) */ if(input) free(input); if(output) free(output); //if(verificationOutput) //free(verificationOutput); if(devices) free(devices); return 1; } int main(int argc, char * argv[]) { DCT clDCT("OpenCL DCT8x8"); clDCT.initialize(); if(!clDCT.parseCommandLine(argc, argv)) return 0; clDCT.setup(); clDCT.run(); clDCT.verifyResults(); clDCT.cleanup(); clDCT.printStats(); return 0; } /////////////////////// End of DCT.cpp///////////////////////// /////////////////////////////////////////// DCT.hpp ////////////////////////////////////////////////////////////////////////////// #ifndef DCT_H_ #define DCT_H_ #include <CL/cl.h> #include <stdio.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <SDKUtil/SDKCommon.hpp> #include <SDKUtil/SDKApplication.hpp> #include <SDKUtil/SDKCommandArgs.hpp> #include <SDKUtil/SDKFile.hpp> #if !defined(M_PI) #define M_PI (3.14159265358979323846f) #endif namespace amd { namespace DCT { const cl_float a = cos(M_PI/16)/2; const cl_float b = cos(M_PI/8 )/2; const cl_float c = cos(3*M_PI/16)/2; const cl_float d = cos(5*M_PI/16)/2; const cl_float e = cos(3*M_PI/8)/2; const cl_float f = cos(7*M_PI/16)/2; const cl_float g = 1.0f/sqrt(8.0f); /** * DCT8x8 mask that is used to calculate Discrete Cosine Transform * of an 8x8 matrix */ cl_float dct8x8[64] = { g, a, b, c, g, d, e, f, g, c, e, -f, -g, -a, -b, -d, g, d, -e, -a, -g, f, b, c, g, f, -b, -d, g, c, -e, -a, g, -f, -b, d, g, -c, -e, a, g, -d, -e, a, -g, -f, b, -c, g, -c, e, f, -g, a, -b, d, g, -a, b, -c, g, -d, e, f }; /** * DCT * Class implements OpenCL Discrete Cosine Transform * Derived from SDKSample base class */ class DCT : public SDKSample { cl_uint seed; /**< Seed value for random number generation */ cl_double totalKernelTime; /**< Time for kernel execution */ cl_double totalProgramTime; /**< Time for program execution */ cl_double referenceKernelTime; /**< Time for reference implementation */ cl_uchar *input; /**< Input array */ cl_uchar *output; /**< Input array */ cl_uint blockWidth; /**< width of the blockSize */ cl_uint blockSize; /**< size fo the block */ cl_bool inverse; /**< flag for inverse DCT */ cl_float *verificationOutput; /**< Input array for reference implementation */ cl_context context; /**< CL context */ cl_device_id *devices; /**< CL device list */ cl_mem inputBuffer; /**< CL memory buffer */ cl_mem outputBuffer; /**< CL memory buffer */ cl_mem dctBuffer; /**< CL memory buffer */ cl_command_queue commandQueue; /**< CL command queue */ cl_program program; /**< CL program */ cl_kernel kernel; /**< CL kernel */ public: /** * Constructor * Initialize member variables * @param name name of sample (string) */ DCT(std::string name) : SDKSample(name) { seed = 123; input = NULL; verificationOutput = NULL; blockWidth = 8; blockSize = blockWidth * blockWidth; inverse = false; } /** * Constructor * Initialize member variables * @param name name of sample (const char*) */ DCT(const char* name) : SDKSample(name) { seed = 123; input = NULL; verificationOutput = NULL; blockWidth = 8; blockSize = blockWidth * blockWidth; inverse = false; } /** * Allocate and initialize host memory array with random values * @return 1 on success and 0 on failure */ int setupDCT(); /** * OpenCL related initialisations. * Set up Context, Device list, Command Queue, Memory buffers * Build CL kernel program executable * @return 1 on success and 0 on failure */ int setupCL(); /** * Set values for kernels' arguments, enqueue calls to the kernels * on to the command queue, wait till end of kernel execution. * Get kernel start and end time if timing is enabled * @return 1 on success and 0 on failure */ int runCLKernels(); /** * Given the blockindices and localIndicies this * function calculate the global index * @param blockIdx index of the block horizontally * @param blockIdy index of the block vertically * @param localidx index of the element relative to the block horizontally * @param localIdy index of the element relative to the block vertically * @param blockWidth width of each blcok which is 8 * @param globalWidth Width of the input matrix */ //cl_uint getIdx(cl_uchar *imgdata, cl_uint blockIdy, cl_uint blockIdx, cl_float stag_value, cl_uint index); cl_uint getIdx(cl_uchar *imgdata, cl_uint blockIdx, cl_uint blockIdy, cl_uint localIdx, cl_uint localIdy, cl_uint blockWidth, cl_uint globalWidth); //uint StaggerCorrection(unsigned char *,int ,int ,float ,int); /** * Reference CPU implementation of Discrete Cosine Transform * for performance comparison * @param output output of the DCT8x8 transform * @param input input array * @param dct8x8 8x8 consine function base used to calculate DCT8x8 * @param width width of the input matrix * @param height height of the input matrix * @param numBlocksX number of blocks horizontally * @param numBlocksY number of blocks vertically * @param inverse flag to perform inverse DCT */ #if 0 void DCTCPUReference( cl_float * output, const cl_float * input , const cl_float * dct8x8 , const cl_uint width, const cl_uint height, const cl_uint numBlocksX, const cl_uint numBlocksY, const cl_bool inverse); #endif /** * Override from SDKSample. Print sample stats. */ void printStats(); /** * Override from SDKSample. Initialize * command line parser, add custom options */ int initialize(); /** * Override from SDKSample, adjust width and height * of execution domain, perform all sample setup */ int setup(); /** * Override from SDKSample * Run OpenCL DCT */ int run(); /** * Override from SDKSample * Cleanup memory allocations */ int cleanup(); /** * Override from SDKSample * Verify against reference implementation */ int verifyResults(); }; } //namespace DCT }//namespace amd #endif /////////////////////////////////// END of DCT.hpp ///////////////////////////////////// ///////////////////////////////////DCT_kernel.cl///////////////////////////////////////////////////////// /** * Given the blockindices and localIndicies this * function calculate the global index * @param blockIdx index of the block horizontally * @param blockIdy index of the block vertically * @param localidx index of the element relative to the block horizontally * @param localIdy index of the element relative to the block vertically * @param blockWidth width of each blcok which is 8 * @param globalWidth Width of the input matrix */ uint getIdx(uchar *imgdata, uint blockIdx, uint blockIdy, uint localIdx, uint localIdy, uint blockWidth, uint globalWidth) { uint globalIdx = blockIdx * blockWidth + localIdx; uint globalIdy = blockIdy * blockWidth + localIdy; return (globalIdy * globalWidth + globalIdx); } /** * Perform Discrete Cosine Transform for block of size 8x8 * in the input matrix * @param output output of the DCT8x8 transform * @param input input array * @param dct8x8 8x8 consine function base used to calculate DCT8x8 * @param inter local memory which stores intermediate result * @param width width of the input matrix * @param blockWidth width of each block, 8 here * @param inverse flag to perform inverse DCT */ __kernel void DCT(__global uchar * output, __global uchar * input, __local uchar * inter, const uint width, const uint height, const uint blockWidth) { /* get global indices of the element */ uint globalIdx = get_global_id(0); uint globalIdy = get_global_id(1); /* get indices of the block to which the element belongs to */ uint groupIdx = get_group_id(0); uint groupIdy = get_group_id(1); /* get indices relative to the block */ uint i = get_local_id(0); uint j = get_local_id(1); uint idx = globalIdy * width + globalIdx; /* initialise the accumulator */ float acc = 0.0f; for(uint m=0; m < height; m++) for(uint k=0; k < blockWidth; k++) inter[m*blockWidth+k] = input[m*blockWidth+k]; for(uint k=0; k < blockWidth; k++) { getIdx(&inter[0], groupIdx, groupIdy, j, k, blockWidth, width); } /* AT * X */ #if 0 for(uint k=0; k < blockWidth; k++) { uint index1 = (inverse)? i*blockWidth + k : k * blockWidth + i; uint index2 = getIdx(groupIdx, groupIdy, j, k, blockWidth, width); acc += input[index2]; } inter[idx] = acc; #endif for(uint j=0; j < groupIdy; j++) for(uint k=0; k < groupIdx; k++) output[j*groupIdx+k] = input[j*groupIdx+k]; /* * Make sure all the values of inter that belong to a block * are calculated before proceeding further */ barrier(CLK_LOCAL_MEM_FENCE); /* again initalising the accumulator */ acc = 0.0f; /* (AT * X) * A */ for(uint k=0; k < blockWidth; k++) { //uint index1 = getIdx(groupIdx, groupIdy, k, i, blockWidth, width); //uint index2 = (inverse)? j*blockWidth + k : k* blockWidth + j; //acc += inter[index1] * dct8x8[index2]; } output[idx] = acc; }

                                                      • Open source OpenCL for CPU
                                                        brg

                                                        The restrictions you mention are no longer present and there is a disconnect in the release notes. __local within a kernel is supported as are arrays within structs. We are working on getting the release notes updated to reflect this.

                                                        There are actually a number of offer key differences between Brook and OpenCL but I think two of the most important ones are:

                                                             - OpenCL supports a full scatter/gather mode

                                                             - OpenCL is intended for CPU and GPU

                                                        • Open source OpenCL for CPU
                                                          hagen

                                                          Actually, local arrays in brook+ would correspond to __private memory space in openCL, not __local.  __ local space allows you to share data between different work items inside a work group, more like LDS in brook+.