26 Replies Latest reply on Jul 26, 2010 12:46 PM by lsolano

    OpenCL.lib and multivendor implementation

    bubu
      OpenCL.lib and multivendor implementation

       

      Hello,

       

      I'm, writing an application that gonna use OpenCL.

      You've currently available a beta OpenCL SDK that requires to link with OpenCL.lib(which indirectly also requires a OpenCL.dll). The question is... imagine I want to use both ATI, NVIDIA and ClearSpeed OpenCL implementations in the same .EXE. How OpenCL.lib would find the proper OpenCL.dll? I bet there will be a conflict...

      I think the solution is to modify a bit how you deploy the OpenCL.lib.

      1. Rename the OpenCL.dll to ATI_OpenCL_1_0_0.dll ( include company's name + version to avoid DLL hell ).

       

      And other question...

      How can I enumerate all the OpenCL devices in a machine? Can I use your SDK to enumerate and use, for example, a NVIDIA graphics card? I need a way to do that... like DX10 does with the IDXGIFactory/IDXGIAdapter/IDXGIOutput. A good separation between the Interface and Implementation is vital.

       

      thx

       

        • OpenCL.lib and multivendor implementation
          nou

          i think it will be same as with OpenGL. application will not ship with her OpenCL.dll. application will dynamicaly load dll and call apropiate functions.

          enumerate all devices. dont forget that context may contain only device within same platform.

          cl_uint platform_count; clGetPlatformIDs(0, NULL, &platform_count);//get count cl_platform_id *ids = new cl_platform_id[platform_count];//alocate array of platforms_id clGetPlatformIDs(platform_count, ids, NULL);//get devices //for every platform_id cl_uint num_dev; clGetDeviceIDs(ids[i], typ, 0, NULL, &num_dev); cl_device_id *devices = new cl_device_id[num_dev]; clGetDeviceIDs(ids[i], CL_DEVICE_TYPE_ALL, num_dev, &devices[0], NULL);

          • OpenCL.lib and multivendor implementation
            omkaranathan

             

            Originally posted by: bubu 

            How can I enumerate all the OpenCL devices in a machine? Can I use your SDK to enumerate and use, for example, a NVIDIA graphics card?

             

            Only those devices supported by the dll can be enumerated. Its not possible to to enumerate and use an NVIDIA graphics card with the current SDK.

              • OpenCL.lib and multivendor implementation
                bubu

                 

                Only those devices supported by the dll can be enumerated. Its not possible to to enumerate and use an NVIDIA graphics card with the current SDK.

                 

                So do I need to load manually the ATIOpenCL.dll and get all the procs myself as I did with OpenGL?

                I'm gonna need to create 18 different versions of my .EXE application ... One linked with the ATI OpenCL SDK... other for NVIDIA... other for Intel... other for ClearSpeed... other for RapindMind... other for XXXX.... a nightmare....

                 

                 

                 

                  • OpenCL.lib and multivendor implementation
                    thatguymike

                    Khronos is working on an ICD model much like OpenGL so that you will only link against a standard library which the vendors will then plugin to.

                      • OpenCL.lib and multivendor implementation
                        bubu

                         

                        Originally posted by: thatguymike Khronos is working on an ICD model much like OpenGL so that you will only link against a standard library which the vendors will then plugin to.

                         

                         

                        Do you have more info that, pls? Roadmap?

                        I don't like specially the OpenGL model though...

                         

                        Ideally I would want  a C++ class to enumerate all the OpenCL implementations present on the system. Something like this:

                         

                        class IOpenCLDevice

                        {

                           virtual DevCaps GetDeviceCaps () = 0;

                           virtual void CompileKernel (...) = 0;

                           virtual void RunKernel (...) = 0;

                           virtual void SetKernelArgument (...) = 0;

                         

                        }

                         

                        class OpenCLDeviceEnumerator

                        {

                           std::vector<IOpenCLDevice*> EnumDevices ()

                           {

                              //OS-specific INLINED code provided by Krhonos

                             //1. List all the DLLs on Windows\OpenCL folder

                             //2. Create a OpenCL device struct

                            //3. Get the OpenCL procs using GetProcAddres

                            //4. Add the device to the list

                           }

                        };

                         

                        The OpenCLDeviceEnumerator::EnumDevices () could find all the DLLs present on Windows\OpenCL and get all the DLL procs(via GetProcAdress) and create a list of OpenCL devices. Each OpenCL implementation could deploy a strong-named DLL(vendor ID+version to prevent DLL hell) into that folder(for example ATI_OpenCL_1_0_0.dll ).

                         

                        BUT this code should be provided by Khronos... not done by me manually...

                         

                        I'm very interestered on this because to compile 18 different .EXEs looses the benefits of OpenCL(code once, use on multiple platforms ).

                         

                         

                          • OpenCL.lib and multivendor implementation
                            nou

                            model to enumerate all device from all vendor in the system is the Platfrom layer. but that assume that some standart implemtation of clGetPlatformIDs() that will return all platform on the sytem. AMD/ATi CPU/GPU, nVidia GPU etc. but that is now imposible.

                            i dont think it will be necessarily recompile application to work with others implementation. but application will be limited to work with only one implementation at a time.

                              • OpenCL.lib and multivendor implementation
                                brg

                                Even though Khronos has not finalised the ICD it is still possible to program to it and the attached program shows how the platform layer can be used to enumerate all the platforms installed on a systems and their corresponding devices. With current OpenCL implementations you it will return only a single platform but once the ICD is installed you will see all installed implementations.

                                The output I get running it with the AMD OpenCL SDK beta on a Phenom II is:

                                 bgaster@bgaster-shuttle:~/tree/dist/linux/release/examples/info$ ./info
                                For test only: Expires on Wed Sep 30 00:00:00 2009
                                Number of platforms:     1
                                  Plaform Profile:     FULL_PROFILE
                                  Plaform Version:     OpenCL 1.0 ATI-Stream-v2.0-beta2
                                  Plaform Name:      ATI Stream
                                  Plaform Vendor:     Advanced Micro Devices


                                  Plaform Name:      ATI Stream
                                Number of devices:     1
                                  Device Type:      CL_DEVICE_TYPE_CPU
                                  Device ID:      4098
                                  Max compute units:     4
                                  Max work items dimensions:    3
                                    Max work items[0]:     1024
                                    Max work items[1]:     1024
                                    Max work items[2]:     1024
                                  Max work group size:     1024
                                  Preferred vector width char:    16
                                  Preferred vector width short:    8
                                  Preferred vector width int:    4
                                  Preferred vector width long:    2
                                  Preferred vector width float:    4
                                  Preferred vector width double:   0
                                  Max clock frequency:     3000Mhz
                                  Address bits:      64
                                  Max memeory allocation:    1073741824
                                  Image support:     No
                                  Max size of kernel argument:    4096
                                  Alignment (bits) of base address:   1024
                                  Minimum alignment (bytes) for any datatype:  128
                                  Single precision floating point capability
                                    Denorms:      Yes
                                    Quiet NaNs:      Yes
                                    Round to nearest even:    Yes
                                    Round to zero:     No
                                    Round to +ve and infinity:    No
                                    IEEE754-2008 fused multiply-add:   No
                                  Cache type:      Read/Write
                                  Cache line size:     64
                                  Cache size:      65536
                                  Global memory size:     3221225472
                                  Constant buffer size:     65536
                                  Max number of constant args:    8
                                  Local memory type:     Global
                                  Local memory size:     32768
                                  Profiling timer resolution:    1
                                  Device endianess:     Little
                                  Available:      Yes
                                  Compiler available:     Yes
                                  Execution capabilities:    
                                    Execute OpenCL kernels:    Yes
                                    Execute native function:    No
                                  Queue properties:    
                                    Out-of-Order:     No
                                    Profiling :      Yes
                                  Platform ID:      0
                                  Name:       AMD Processor model unknown
                                  Vendor:      AuthenticAMD
                                  Driver version:     1.0
                                  Profile:      FULL_PROFILE
                                  Version:      OpenCL 1.0
                                  Extensions:      cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store

                                // // Copyright (c) 2008 Advanced Micro Devices, Inc. All rights reserved. // #include <iostream> #include <cstdlib> #define __NO_STD_VECTOR #define __NO_STD_STRING #include <CL/cl.hpp> inline void checkErr(cl_int err, const char * name) { if (err != CL_SUCCESS) { std::cerr << "ERROR: " << name << " (" << err << ")" << std::endl; exit(EXIT_FAILURE); } } int main(void) { cl_int err; // Plaform info cl::vector<cl::Platform> platforms; err = cl::Platform::get(&platforms); checkErr( err && (platforms.size() == 0 ? -1 : CL_SUCCESS), "cl::Platform::get()"); // Iteratate over platforms std::cout << "Number of platforms:\t\t\t\t " << platforms.size() << std::endl; for (cl::vector<cl::Platform>::iterator i = platforms.begin(); i != platforms.end(); ++i) { std::cout << " Plaform Profile:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_PROFILE>().c_str() << std::endl; std::cout << " Plaform Version:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VERSION>().c_str() << std::endl; std::cout << " Plaform Name:\t\t\t\t\t " << (*i).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; std::cout << " Plaform Vendor:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VENDOR>().c_str() << std::endl; if ((*i).getInfo<CL_PLATFORM_EXTENSIONS>().size() > 0) { std::cout << " Plaform Extensions:\t\t\t " << (*i).getInfo<CL_PLATFORM_EXTENSIONS>().c_str() << std::endl; } } std::cout << std::endl << std:: endl; // Now Iteratate over each platform and its devices for (cl::vector<cl::Platform>::iterator p = platforms.begin(); p != platforms.end(); ++p) { std::cout << " Plaform Name:\t\t\t\t\t " << (*p).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; cl::vector<cl::Device> devices; (*p).getDevices(CL_DEVICE_TYPE_ALL, &devices); std::cout << "Number of devices:\t\t\t\t " << devices.size() << std::endl; for (cl::vector<cl::Device>::iterator i = devices.begin(); i != devices.end(); ++i) { std::cout << " Device Type:\t\t\t\t\t " ; cl_device_type dtype = (*i).getInfo<CL_DEVICE_TYPE>(); switch (dtype) { case CL_DEVICE_TYPE_ACCELERATOR: std::cout << "CL_DEVICE_TYPE_ACCRLERATOR" << std::endl; break; case CL_DEVICE_TYPE_CPU: std::cout << "CL_DEVICE_TYPE_CPU" << std::endl; break; case CL_DEVICE_TYPE_DEFAULT: std::cout << "CL_DEVICE_TYPE_DEFAULT" << std::endl; break; case CL_DEVICE_TYPE_GPU: std::cout << "CL_DEVICE_TYPE_GPU" << std::endl; break; } std::cout << " Device ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR_ID>() << std::endl; std::cout << " Max compute units:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>() << std::endl; std::cout << " Max work items dimensions:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>() << std::endl; cl::vector< ::size_t> witems = (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_SIZES>(); for (int x = 0; x < (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>(); x++) { std::cout << " Max work items[" << x << "]:\t\t\t\t " << witems[x] << std::endl; } std::cout << " Max work group size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_GROUP_SIZE>() << std::endl; std::cout << " Preferred vector width char:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR>() << std::endl; std::cout << " Preferred vector width short:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT>() << std::endl; std::cout << " Preferred vector width int:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT>() << std::endl; std::cout << " Preferred vector width long:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG>() << std::endl; std::cout << " Preferred vector width float:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT>() << std::endl; std::cout << " Preferred vector width double:\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE>() << std::endl; std::cout << " Max clock frequency:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CLOCK_FREQUENCY>() << "Mhz" << std::endl; std::cout << " Address bits:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_ADDRESS_BITS>() << std::endl; std::cout << " Max memeory allocation:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_MEM_ALLOC_SIZE>() << std::endl; std::cout << " Image support:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>() ? "Yes" : "No") << std::endl; if ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>()) { std::cout << " Max number of images read arguments:\t " << (*i).getInfo<CL_DEVICE_MAX_READ_IMAGE_ARGS>() << std::endl; std::cout << " Max number of images write arguments:\t " << (*i).getInfo<CL_DEVICE_MAX_WRITE_IMAGE_ARGS>() << std::endl; std::cout << " Max image 2D width:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_WIDTH>() << std::endl; std::cout << " Max image 2D height:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D width:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_WIDTH>() << std::endl; std::cout << " Max image 3D height:\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D depth:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_DEPTH>() << std::endl; std::cout << " Max samplers within kernel:\t\t " << (*i).getInfo<CL_DEVICE_MAX_SAMPLERS>() << std::endl; } std::cout << " Max size of kernel argument:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_PARAMETER_SIZE>() << std::endl; std::cout << " Alignment (bits) of base address:\t\t " << (*i).getInfo<CL_DEVICE_MEM_BASE_ADDR_ALIGN>() << std::endl; std::cout << " Minimum alignment (bytes) for any datatype:\t " << (*i).getInfo<CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE>() << std::endl; std::cout << " Single precision floating point capability" << std::endl; std::cout << " Denorms:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_DENORM ? "Yes" : "No") << std::endl; std::cout << " Quiet NaNs:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_INF_NAN ? "Yes" : "No") << std::endl; std::cout << " Round to nearest even:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_NEAREST ? "Yes" : "No") << std::endl; std::cout << " Round to zero:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_ZERO ? "Yes" : "No") << std::endl; std::cout << " Round to +ve and infinity:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_INF ? "Yes" : "No") << std::endl; std::cout << " IEEE754-2008 fused multiply-add:\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_FMA ? "Yes" : "No") << std::endl; std::cout << " Cache type:\t\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_TYPE>()) { case CL_NONE: std::cout << "None" << std::endl; break; case CL_READ_ONLY_CACHE: std::cout << "Read only" << std::endl; break; case CL_READ_WRITE_CACHE: std::cout << "Read/Write" << std::endl; break; } std::cout << " Cache line size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE>() << std::endl; std::cout << " Cache size:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_SIZE>() << std::endl; std::cout << " Global memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_SIZE>() << std::endl; std::cout << " Constant buffer size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE>() << std::endl; std::cout << " Max number of constant args:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_ARGS>() << std::endl; std::cout << " Local memory type:\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_LOCAL_MEM_TYPE>()) { case CL_LOCAL: std::cout << "Scratchpad" << std::endl; break; case CL_GLOBAL: std::cout << "Global" << std::endl; break; } std::cout << " Local memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_LOCAL_MEM_SIZE>() << std::endl; std::cout << " Profiling timer resolution:\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILING_TIMER_RESOLUTION>() << std::endl; std::cout << " Device endianess:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_ENDIAN_LITTLE>() ? "Little" : "Big") << std::endl; std::cout << " Available:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Compiler available:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_COMPILER_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Execution capabilities:\t\t\t\t " << std::endl; std::cout << " Execute OpenCL kernels:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Execute native function:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_NATIVE_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Queue properties:\t\t\t\t " << std::endl; std::cout << " Out-of-Order:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Profiling :\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_PROFILING_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Platform ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PLATFORM>() << std::endl; std::cout << " Name:\t\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_NAME>().c_str() << std::endl; std::cout << " Vendor:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR>().c_str() << std::endl; std::cout << " Driver version:\t\t\t\t " << (*i).getInfo<CL_DRIVER_VERSION>().c_str() << std::endl; std::cout << " Profile:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILE>().c_str() << std::endl; std::cout << " Version:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VERSION>().c_str() << std::endl; std::cout << " Extensions:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_EXTENSIONS>().c_str() << std::endl; } std::cout << std::endl << std::endl; } return EXIT_SUCCESS; }

                                  • OpenCL.lib and multivendor implementation
                                    bubu

                                    The (*l_itDevice).getInfo<CL_DEVICE_NAME>().c_str() gives me these errors:

                                     

                                    error C2770: invalid explicit template argument(s) for 'detail::Param_traits::Param_type cl::Device::getInfo(cl_int *) const'
                                    1>        G:\ATIStream\include\CL/cl.hpp(1206) : see declaration of 'cl::Device::getInfo'
                                    1>error C2780: 'cl_int cl::Device::getInfo(cl_device_info,T *) const' : expects 2 arguments - 0 provided
                                    1>        G:\ATIStream\include\CL/cl.hpp(1175) : see declaration of 'cl::Device::getInfo'

                                     

                                    Also, when I call the

                                    cl::vector<cl::Device>::iterator l_itDevice;

                                    for (  l_itDevice=l_stlDevices.begin(); .... )

                                     

                                    I got this:

                                    G:\ATIStream\include\CL/cl.hpp(688) : error C2352: 'cl::vector::iterator::begin' : illegal call of non-static member function
                                    1>        with
                                    1>        [
                                    1>            T=cl::Device
                                    1>        ]
                                    1>        G:\ATIStream\include\CL/cl.hpp(635) : see declaration of 'cl::vector::iterator::begin'
                                    1>        with
                                    1>        [
                                    1>            T=cl::Device
                                    1>        ]
                                    1>        G:\ATIStream\include\CL/cl.hpp(687) : while compiling class template member function 'cl::vector::iterator cl::vector::begin(void)'
                                    1>        with
                                    1>        [
                                    1>            T=cl::Device
                                    1>        ]
                                    1>        G:\ATIStream\include\CL/cl.hpp(1348) : see reference to class template instantiation 'cl::vector' being compiled
                                    1>        with
                                    1>        [
                                    1>            T=cl::Device
                                    1>        ]

                                     

                                     

                              • OpenCL.lib and multivendor implementation
                                ruysch

                                 

                                Originally posted by: thatguymike Khronos is working on an ICD model much like OpenGL so that you will only link against a standard library which the vendors will then plugin to.

                                 

                                 

                                Any news on the roadmap... multivendor support is sort of crucial for a wide spread usage of OpenCL;

                                  • OpenCL.lib and multivendor implementation
                                    nou

                                    both nVidia and AMD now support ICD model so application will run on both platform without any change.

                                    for now what does not work is enumerate two or more platforms.

                                      • OpenCL.lib and multivendor implementation
                                        Spooky_

                                        Hello there. I am currently trying to get a test application to run on both ATi and NVidia systems. However, I am not sure which OpenCL.lib I have to use?

                                        Does it matter if I use the one from the NVidia SDK or the one from the ATi SDK? The NVidia SDK OpenCL.lib seems to be much older (dated 25.08.2009) and I get unresolved external symbols (as if the lib wasn't even there) if I try to compile the AMD OpenCL tutorial.

                                          • OpenCL.lib and multivendor implementation
                                            omkaranathan

                                             

                                            Hello there. I am currently trying to get a test application to run on both ATi and NVidia systems. However, I am not sure which OpenCL.lib I have to use?

                                            Use the library depending on the system you are using. You wont be able to run your program on ATI card if you use Nvidia implimentation and vice-versa.

                                            Does it matter if I use the one from the NVidia SDK or the one from the ATi SDK? The NVidia SDK OpenCL.lib seems to be much older (dated 25.08.2009) and I get unresolved external symbols (as if the lib wasn't even there) if I try to compile the AMD OpenCL tutorial.

                                            You should be able to run the OpenCL program with any implementation(You just have to recompile the program). Make sure that your program is getting correct libraries.

                                              • OpenCL.lib and multivendor implementation
                                                Spooky_

                                                I thought the whole point of the ICD model was, so that the GPU vendors plug in to the function calls with their OpenCL dlls? Just like in OpenGL.

                                                 

                                                I am currently running a test application both on ATi and NVidia system with the OpenCL.lib from the ATi Stream SDK. It runs without any errors with CL_DEVICE_TYPE_GPU. However with CL_DEVICE_TYPE_CPU I get a context creation error on NVidia systems.

                                                  • OpenCL.lib and multivendor implementation
                                                    omkaranathan

                                                     

                                                    I am currently running a test application both on ATi and NVidia system with the OpenCL.lib from the ATi Stream SDK. It runs without any errors with CL_DEVICE_TYPE_GPU. However with CL_DEVICE_TYPE_CPU I get a context creation error on NVidia systems.

                                                     

                                                    Nvidia implementation does not support CPUs. Do you have the ATIStream SDK installed in your Nvidia system?

                                                      • OpenCL.lib and multivendor implementation
                                                        Spooky_

                                                        Ah ok. So using the ATi SDK OpenCL.lib should work fine for any purpose, since the implementation is dynamically loaded via the vendor's DLL anyway?

                                                         

                                                        Yes, I installed the Stream SDK (only the SDK without the other 2 things) on an NVidia system in order to use its OpenCL.lib

                                                          • OpenCL.lib and multivendor implementation
                                                            omkaranathan

                                                            The name of both the dlls are same. You have to ensure that the correct .dll(StreamSDK .dll) is in PATH.

                                                              • OpenCL.lib and multivendor implementation
                                                                Spooky_

                                                                The vendor specific OpenCL.dll from the driver is stored in \windows\system32 (for 32bit applications). That's the one that is used by the application.

                                                                The SDK specific .dlls will only be used through the PATH variable, if no .dll is found in the working directory or the system directory.

                                                                When testing the application with dependency walker, it shows that the correct OpenCL.dll is used (the one from the NVidia driver in system32).

                                                                Btw. the SDK specific DLLs are not named the same. The ATi one is named atiocl.dll for example.

                                                                 

                                                                As I said, the application runs fine on the NVidia system, even though I used the ATi OpenCL.lib. I was simply wondering if I would still encounter problems if I only use the OpenCL.lib from ATi, regardless of where the executable is run. Or if I still have to use different build targets with different OpenCL.libs, since there is in fact no standardized library available yet.

                                                                  • OpenCL.lib and multivendor implementation
                                                                    omkaranathan

                                                                     

                                                                    When testing the application with dependency walker, it shows that the correct OpenCL.dll is used (the one from the NVidia driver in system32).

                                                                    If you want to run the program in CPU , you have to use the OpenCL.dll which comes with StreamSDK.

                                                                     

                                                                    Btw. the SDK specific DLLs are not named the same. The ATi one is named atiocl.dll for example.

                                                                     

                                                                     



                                                                    The .dll file names are same. StreamSDK contains OpenCL.dll which is installed in system32 folder.

                                                                     

                                                                     

                                                                      • OpenCL.lib and multivendor implementation
                                                                        Spooky_

                                                                         

                                                                        Originally posted by: omkaranathan

                                                                        If you want to run the program in CPU , you have to use the OpenCL.dll which comes with StreamSDK.



                                                                        Ah, right . I was only wondering about that, I don't really need it anyway. If the NVidia implementation does not support it (yet) it's fine.

                                                                         

                                                                         

                                                                         

                                                                        Originally posted by: omkaranathan

                                                                        The .dll file names are same. StreamSDK contains OpenCL.dll which is installed in system32 folder.



                                                                        Are you sure about that? As far as I know the OpenCL.dll in the system32 folder is installed by the video driver. The StreamSDK has the atiocl.dll in the bin directory, which is added to the PATH variable.

                                                                         

                                                                        // woops, quoting errors

                                                                          • OpenCL.lib and multivendor implementation
                                                                            Spooky_

                                                                            I just checked, the OpenCL.dll in system32 is definitely the one from the NVidia driver, not the one from the StreamSDK.

                                                                            http://dl.dropbox.com/u/2309215/opencl.png

                                                                            • OpenCL.lib and multivendor implementation
                                                                              omkaranathan

                                                                               

                                                                              Are you sure about that? As far as I know the OpenCL.dll in the system32 folder is installed by the video driver. The StreamSDK has the atiocl.dll in the bin directory, which is added to the PATH variable.

                                                                               

                                                                               



                                                                              Yes, both have OpenCL.dll, StreamSDK has atiocl.dll too. Anyways, you dont require it in your Nvidia system if you are not going to use CPU implementation.

                                                                                • OpenCL.lib and multivendor implementation
                                                                                  _Big_Mac_

                                                                                  With NVIDIA's newest dev drivers (197.13) and AMD SDK 2.0.1 we can finally see both platforms

                                                                                  CLInfo.exe says:


                                                                                  Number of platforms: 2
                                                                                  Plaform Profile: FULL_PROFILE
                                                                                  Plaform Version: OpenCL 1.0 CUDA 3.0.1
                                                                                  Plaform Name: NVIDIA CUDA
                                                                                  Plaform Vendor: NVIDIA Corporation
                                                                                  Plaform Extensions: cl_khr_byte_addressable_store cl_khr_ic
                                                                                  d cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing c
                                                                                  l_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
                                                                                  Plaform Profile: FULL_PROFILE
                                                                                  Plaform Version: OpenCL 1.0 ATI-Stream-v2.0.1
                                                                                  Plaform Name: ATI Stream
                                                                                  Plaform Vendor: Advanced Micro Devices, Inc.
                                                                                  Plaform Extensions: cl_khr_icd


                                                                                  Plaform Name: NVIDIA CUDA
                                                                                  Number of devices: 1
                                                                                  Device Type: CL_DEVICE_TYPE_GPU
                                                                                  Device ID: 4318
                                                                                  Max compute units: 16
                                                                                  Max work items dimensions: 3
                                                                                  Max work items[0]: 512
                                                                                  Max work items[1]: 512
                                                                                  Max work items[2]: 64
                                                                                  Max work group size: 512
                                                                                  Preferred vector width char: 1
                                                                                  Preferred vector width short: 1
                                                                                  Preferred vector width int: 1
                                                                                  Preferred vector width long: 1
                                                                                  Preferred vector width float: 1
                                                                                  Preferred vector width double: 0
                                                                                  Max clock frequency: 1625Mhz
                                                                                  Address bits: 32
                                                                                  Max memeory allocation: 134217728
                                                                                  Image support: Yes
                                                                                  Max number of images read arguments: 128
                                                                                  Max number of images write arguments: 8
                                                                                  Max image 2D width: 8192
                                                                                  Max image 2D height: 8192
                                                                                  Max image 3D width: 2048
                                                                                  Max image 3D height: 2048
                                                                                  Max image 3D depth: 2048
                                                                                  Max samplers within kernel: 16
                                                                                  Max size of kernel argument: 4352
                                                                                  Alignment (bits) of base address: 256
                                                                                  Minimum alignment (bytes) for any datatype: 16
                                                                                  Single precision floating point capability
                                                                                  Denorms: No
                                                                                  Quiet NaNs: Yes
                                                                                  Round to nearest even: Yes
                                                                                  Round to zero: Yes
                                                                                  Round to +ve and infinity: Yes
                                                                                  IEEE754-2008 fused multiply-add: Yes
                                                                                  Cache type: None
                                                                                  Cache line size: 0
                                                                                  Cache size: 0
                                                                                  Global memory size: 519634944
                                                                                  Constant buffer size: 65536
                                                                                  Max number of constant args: 9
                                                                                  Local memory type: Scratchpad
                                                                                  Local memory size: 16384
                                                                                  Profiling timer resolution: 1000
                                                                                  Device endianess: Little
                                                                                  Available: Yes
                                                                                  Compiler available: Yes
                                                                                  Execution capabilities:
                                                                                  Execute OpenCL kernels: Yes
                                                                                  Execute native function: No
                                                                                  Queue properties:
                                                                                  Out-of-Order: Yes
                                                                                  Profiling : Yes
                                                                                  Platform ID: 000000000242FEF0
                                                                                  Name: GeForce 8800 GTS 512
                                                                                  Vendor: NVIDIA Corporation
                                                                                  Driver version: 197.13
                                                                                  Profile: FULL_PROFILE
                                                                                  Version: OpenCL 1.0 CUDA
                                                                                  Extensions: cl_khr_byte_addressable_store c
                                                                                  l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_s
                                                                                  haring cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
                                                                                  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics


                                                                                  Plaform Name: ATI Stream
                                                                                  Number of devices: 1
                                                                                  Device Type: CL_DEVICE_TYPE_CPU
                                                                                  Device ID: 4098
                                                                                  Max compute units: 2
                                                                                  Max work items dimensions: 3
                                                                                  Max work items[0]: 1024
                                                                                  Max work items[1]: 1024
                                                                                  Max work items[2]: 1024
                                                                                  Max work group size: 1024
                                                                                  Preferred vector width char: 16
                                                                                  Preferred vector width short: 8
                                                                                  Preferred vector width int: 4
                                                                                  Preferred vector width long: 2
                                                                                  Preferred vector width float: 4
                                                                                  Preferred vector width double: 0
                                                                                  Max clock frequency: 2500Mhz
                                                                                  Address bits: 64
                                                                                  Max memeory allocation: 1073741824
                                                                                  Image support: No
                                                                                  Max size of kernel argument: 4096
                                                                                  Alignment (bits) of base address: 32768
                                                                                  Minimum alignment (bytes) for any datatype: 128
                                                                                  Single precision floating point capability
                                                                                  Denorms: Yes
                                                                                  Quiet NaNs: Yes
                                                                                  Round to nearest even: Yes
                                                                                  Round to zero: No
                                                                                  Round to +ve and infinity: No
                                                                                  IEEE754-2008 fused multiply-add: No
                                                                                  Cache type: Read/Write
                                                                                  Cache line size: 64
                                                                                  Cache size: 65536
                                                                                  Global memory size: 3221225472
                                                                                  Constant buffer size: 65536
                                                                                  Max number of constant args: 8
                                                                                  Local memory type: Global
                                                                                  Local memory size: 32768
                                                                                  Profiling timer resolution: 1
                                                                                  Device endianess: Little
                                                                                  Available: Yes
                                                                                  Compiler available: Yes
                                                                                  Execution capabilities:
                                                                                  Execute OpenCL kernels: Yes
                                                                                  Execute native function: No
                                                                                  Queue properties:
                                                                                  Out-of-Order: No
                                                                                  Profiling : Yes
                                                                                  Platform ID: 000000000364F598
                                                                                  Name: Pentium(R) Dual-Core CPU
                                                                                  E5200 @ 2.50GHz
                                                                                  Vendor: GenuineIntel
                                                                                  Driver version: 1.0
                                                                                  Profile: FULL_PROFILE
                                                                                  Version: OpenCL 1.0 ATI-Stream-v2.0.1
                                                                                  Extensions: cl_khr_icd cl_khr_global_int32_
                                                                                  base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomic
                                                                                  s cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_ext
                                                                                  ended_atomics cl_khr_byte_addressable_store

                                                                                   This doesn't require having any OpenCL.dll in your project's folder, the ICD's set up everything in the PATH and it just works.