Archives Discussions

bubu · ‎08-10-2009

OpenCL.lib and multivendor implementation

Hello,

I'm, writing an application that gonna use OpenCL.

You've currently available a beta OpenCL SDK that requires to link with OpenCL.lib(which indirectly also requires a OpenCL.dll). The question is... imagine I want to use both ATI, NVIDIA and ClearSpeed OpenCL implementations in the same .EXE. How OpenCL.lib would find the proper OpenCL.dll? I bet there will be a conflict...

I think the solution is to modify a bit how you deploy the OpenCL.lib.

1. Rename the OpenCL.dll to ATI_OpenCL_1_0_0.dll ( include company's name + version to avoid DLL hell ).

And other question...

How can I enumerate all the OpenCL devices in a machine? Can I use your SDK to enumerate and use, for example, a NVIDIA graphics card? I need a way to do that... like DX10 does with the IDXGIFactory/IDXGIAdapter/IDXGIOutput. A good separation between the Interface and Implementation is vital.

thx

nou · ‎08-10-2009

i think it will be same as with OpenGL. application will not ship with her OpenCL.dll. application will dynamicaly load dll and call apropiate functions.

enumerate all devices. dont forget that context may contain only device within same platform.

cl_uint platform_count; clGetPlatformIDs(0, NULL, &platform_count);//get count cl_platform_id *ids = new cl_platform_id[platform_count];//alocate array of platforms_id clGetPlatformIDs(platform_count, ids, NULL);//get devices //for every platform_id cl_uint num_dev; clGetDeviceIDs(ids, typ, 0, NULL, &num_dev); cl_device_id *devices = new cl_device_id[num_dev]; clGetDeviceIDs(ids, CL_DEVICE_TYPE_ALL, num_dev, &devices[0], NULL);

omkaranathan · ‎08-10-2009

Originally posted by: bubu

How can I enumerate all the OpenCL devices in a machine? Can I use your SDK to enumerate and use, for example, a NVIDIA graphics card?

Only those devices supported by the dll can be enumerated. Its not possible to to enumerate and use an NVIDIA graphics card with the current SDK.

bubu · ‎08-10-2009

Only those devices supported by the dll can be enumerated. Its not possible to to enumerate and use an NVIDIA graphics card with the current SDK.

So do I need to load manually the ATIOpenCL.dll and get all the procs myself as I did with OpenGL?

I'm gonna need to create 18 different versions of my .EXE application ... One linked with the ATI OpenCL SDK... other for NVIDIA... other for Intel... other for ClearSpeed... other for RapindMind... other for XXXX.... a nightmare....

thatguymike · ‎08-10-2009

Khronos is working on an ICD model much like OpenGL so that you will only link against a standard library which the vendors will then plugin to.

bubu · ‎08-11-2009

Originally posted by: thatguymike Khronos is working on an ICD model much like OpenGL so that you will only link against a standard library which the vendors will then plugin to.

Do you have more info that, pls? Roadmap?

I don't like specially the OpenGL model though...

Ideally I would want a C++ class to enumerate all the OpenCL implementations present on the system. Something like this:

class IOpenCLDevice

{

virtual DevCaps GetDeviceCaps () = 0;

virtual void CompileKernel (...) = 0;

virtual void RunKernel (...) = 0;

virtual void SetKernelArgument (...) = 0;

}

class OpenCLDeviceEnumerator

{

std::vector<IOpenCLDevice*> EnumDevices ()

{

//OS-specific INLINED code provided by Krhonos

//1. List all the DLLs on Windows\OpenCL folder

//2. Create a OpenCL device struct

//3. Get the OpenCL procs using GetProcAddres

//4. Add the device to the list

}

};

The OpenCLDeviceEnumerator::EnumDevices () could find all the DLLs present on Windows\OpenCL and get all the DLL procs(via GetProcAdress) and create a list of OpenCL devices. Each OpenCL implementation could deploy a strong-named DLL(vendor ID+version to prevent DLL hell) into that folder(for example ATI_OpenCL_1_0_0.dll ).

BUT this code should be provided by Khronos... not done by me manually...

I'm very interestered on this because to compile 18 different .EXEs looses the benefits of OpenCL(code once, use on multiple platforms ).

nou · ‎08-11-2009

model to enumerate all device from all vendor in the system is the Platfrom layer. but that assume that some standart implemtation of clGetPlatformIDs() that will return all platform on the sytem. AMD/ATi CPU/GPU, nVidia GPU etc. but that is now imposible.

i dont think it will be necessarily recompile application to work with others implementation. but application will be limited to work with only one implementation at a time.

brg · ‎08-14-2009

Even though Khronos has not finalised the ICD it is still possible to program to it and the attached program shows how the platform layer can be used to enumerate all the platforms installed on a systems and their corresponding devices. With current OpenCL implementations you it will return only a single platform but once the ICD is installed you will see all installed implementations.

The output I get running it with the AMD OpenCL SDK beta on a Phenom II is:

bgaster@bgaster-shuttle:~/tree/dist/linux/release/examples/info$ ./info
For test only: Expires on Wed Sep 30 00:00:00 2009
Number of platforms:     1
Plaform Profile:     FULL_PROFILE
Plaform Version:     OpenCL 1.0 ATI-Stream-v2.0-beta2
Plaform Name:      ATI Stream
Plaform Vendor:     Advanced Micro Devices

Plaform Name:      ATI Stream
Number of devices:     1
Device Type:      CL_DEVICE_TYPE_CPU
Device ID:      4098
Max compute units:     4
Max work items dimensions:    3
    Max work items[0]:     1024
    Max work items[1]:     1024
    Max work items[2]:     1024
Max work group size:     1024
Preferred vector width char:    16
Preferred vector width short:    8
Preferred vector width int:    4
Preferred vector width long:    2
Preferred vector width float:    4
Preferred vector width double:   0
Max clock frequency:     3000Mhz
Address bits:      64
Max memeory allocation:    1073741824
Image support:     No
Max size of kernel argument:    4096
Alignment (bits) of base address:   1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
    Denorms:      Yes
    Quiet NaNs:      Yes
    Round to nearest even:    Yes
    Round to zero:     No
    Round to +ve and infinity:    No
    IEEE754-2008 fused multiply-add:   No
Cache type:      Read/Write
Cache line size:     64
Cache size:      65536
Global memory size:     3221225472
Constant buffer size:     65536
Max number of constant args:    8
Local memory type:     Global
Local memory size:     32768
Profiling timer resolution:    1
Device endianess:     Little
Available:      Yes
Compiler available:     Yes
Execution capabilities:
    Execute OpenCL kernels:    Yes
    Execute native function:    No
Queue properties:
    Out-of-Order:     No
    Profiling :      Yes
Platform ID:      0
Name:       AMD Processor model unknown
Vendor:      AuthenticAMD
Driver version:     1.0
Profile:      FULL_PROFILE
Version:      OpenCL 1.0
Extensions:      cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store

// // Copyright (c) 2008 Advanced Micro Devices, Inc. All rights reserved. // #include <iostream> #include <cstdlib> #define __NO_STD_VECTOR #define __NO_STD_STRING #include <CL/cl.hpp> inline void checkErr(cl_int err, const char * name) { if (err != CL_SUCCESS) { std::cerr << "ERROR: " << name << " (" << err << ")" << std::endl; exit(EXIT_FAILURE); } } int main(void) { cl_int err; // Plaform info cl::vector<cl::Platform> platforms; err = cl::Platform::get(&platforms); checkErr( err && (platforms.size() == 0 ? -1 : CL_SUCCESS), "cl::Platform::get()"); // Iteratate over platforms std::cout << "Number of platforms:\t\t\t\t " << platforms.size() << std::endl; for (cl::vector<cl::Platform>::iterator i = platforms.begin(); i != platforms.end(); ++i) { std::cout << " Plaform Profile:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_PROFILE>().c_str() << std::endl; std::cout << " Plaform Version:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VERSION>().c_str() << std::endl; std::cout << " Plaform Name:\t\t\t\t\t " << (*i).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; std::cout << " Plaform Vendor:\t\t\t\t " << (*i).getInfo<CL_PLATFORM_VENDOR>().c_str() << std::endl; if ((*i).getInfo<CL_PLATFORM_EXTENSIONS>().size() > 0) { std::cout << " Plaform Extensions:\t\t\t " << (*i).getInfo<CL_PLATFORM_EXTENSIONS>().c_str() << std::endl; } } std::cout << std::endl << std:: endl; // Now Iteratate over each platform and its devices for (cl::vector<cl::Platform>::iterator p = platforms.begin(); p != platforms.end(); ++p) { std::cout << " Plaform Name:\t\t\t\t\t " << (*p).getInfo<CL_PLATFORM_NAME>().c_str() << std::endl; cl::vector<cl::Device> devices; (*p).getDevices(CL_DEVICE_TYPE_ALL, &devices); std::cout << "Number of devices:\t\t\t\t " << devices.size() << std::endl; for (cl::vector<cl::Device>::iterator i = devices.begin(); i != devices.end(); ++i) { std::cout << " Device Type:\t\t\t\t\t " ; cl_device_type dtype = (*i).getInfo<CL_DEVICE_TYPE>(); switch (dtype) { case CL_DEVICE_TYPE_ACCELERATOR: std::cout << "CL_DEVICE_TYPE_ACCRLERATOR" << std::endl; break; case CL_DEVICE_TYPE_CPU: std::cout << "CL_DEVICE_TYPE_CPU" << std::endl; break; case CL_DEVICE_TYPE_DEFAULT: std::cout << "CL_DEVICE_TYPE_DEFAULT" << std::endl; break; case CL_DEVICE_TYPE_GPU: std::cout << "CL_DEVICE_TYPE_GPU" << std::endl; break; } std::cout << " Device ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR_ID>() << std::endl; std::cout << " Max compute units:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>() << std::endl; std::cout << " Max work items dimensions:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>() << std::endl; cl::vector< ::size_t> witems = (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_SIZES>(); for (int x = 0; x < (*i).getInfo<CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS>(); x++) { std::cout << " Max work items[" << x << "]:\t\t\t\t " << witems << std::endl; } std::cout << " Max work group size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_WORK_GROUP_SIZE>() << std::endl; std::cout << " Preferred vector width char:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR>() << std::endl; std::cout << " Preferred vector width short:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT>() << std::endl; std::cout << " Preferred vector width int:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT>() << std::endl; std::cout << " Preferred vector width long:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG>() << std::endl; std::cout << " Preferred vector width float:\t\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT>() << std::endl; std::cout << " Preferred vector width double:\t\t " << (*i).getInfo<CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE>() << std::endl; std::cout << " Max clock frequency:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CLOCK_FREQUENCY>() << "Mhz" << std::endl; std::cout << " Address bits:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_ADDRESS_BITS>() << std::endl; std::cout << " Max memeory allocation:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_MEM_ALLOC_SIZE>() << std::endl; std::cout << " Image support:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>() ? "Yes" : "No") << std::endl; if ((*i).getInfo<CL_DEVICE_IMAGE_SUPPORT>()) { std::cout << " Max number of images read arguments:\t " << (*i).getInfo<CL_DEVICE_MAX_READ_IMAGE_ARGS>() << std::endl; std::cout << " Max number of images write arguments:\t " << (*i).getInfo<CL_DEVICE_MAX_WRITE_IMAGE_ARGS>() << std::endl; std::cout << " Max image 2D width:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_WIDTH>() << std::endl; std::cout << " Max image 2D height:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE2D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D width:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_WIDTH>() << std::endl; std::cout << " Max image 3D height:\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_HEIGHT>() << std::endl; std::cout << " Max image 3D depth:\t\t\t " << (*i).getInfo<CL_DEVICE_IMAGE3D_MAX_DEPTH>() << std::endl; std::cout << " Max samplers within kernel:\t\t " << (*i).getInfo<CL_DEVICE_MAX_SAMPLERS>() << std::endl; } std::cout << " Max size of kernel argument:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_PARAMETER_SIZE>() << std::endl; std::cout << " Alignment (bits) of base address:\t\t " << (*i).getInfo<CL_DEVICE_MEM_BASE_ADDR_ALIGN>() << std::endl; std::cout << " Minimum alignment (bytes) for any datatype:\t " << (*i).getInfo<CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE>() << std::endl; std::cout << " Single precision floating point capability" << std::endl; std::cout << " Denorms:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_DENORM ? "Yes" : "No") << std::endl; std::cout << " Quiet NaNs:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_INF_NAN ? "Yes" : "No") << std::endl; std::cout << " Round to nearest even:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_NEAREST ? "Yes" : "No") << std::endl; std::cout << " Round to zero:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_ZERO ? "Yes" : "No") << std::endl; std::cout << " Round to +ve and infinity:\t\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_ROUND_TO_INF ? "Yes" : "No") << std::endl; std::cout << " IEEE754-2008 fused multiply-add:\t\t " << ((*i).getInfo<CL_DEVICE_SINGLE_FP_CONFIG>() & CL_FP_FMA ? "Yes" : "No") << std::endl; std::cout << " Cache type:\t\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_TYPE>()) { case CL_NONE: std::cout << "None" << std::endl; break; case CL_READ_ONLY_CACHE: std::cout << "Read only" << std::endl; break; case CL_READ_WRITE_CACHE: std::cout << "Read/Write" << std::endl; break; } std::cout << " Cache line size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE>() << std::endl; std::cout << " Cache size:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_CACHE_SIZE>() << std::endl; std::cout << " Global memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_GLOBAL_MEM_SIZE>() << std::endl; std::cout << " Constant buffer size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE>() << std::endl; std::cout << " Max number of constant args:\t\t\t " << (*i).getInfo<CL_DEVICE_MAX_CONSTANT_ARGS>() << std::endl; std::cout << " Local memory type:\t\t\t\t " ; switch ((*i).getInfo<CL_DEVICE_LOCAL_MEM_TYPE>()) { case CL_LOCAL: std::cout << "Scratchpad" << std::endl; break; case CL_GLOBAL: std::cout << "Global" << std::endl; break; } std::cout << " Local memory size:\t\t\t\t " << (*i).getInfo<CL_DEVICE_LOCAL_MEM_SIZE>() << std::endl; std::cout << " Profiling timer resolution:\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILING_TIMER_RESOLUTION>() << std::endl; std::cout << " Device endianess:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_ENDIAN_LITTLE>() ? "Little" : "Big") << std::endl; std::cout << " Available:\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Compiler available:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_COMPILER_AVAILABLE>() ? "Yes" : "No") << std::endl; std::cout << " Execution capabilities:\t\t\t\t " << std::endl; std::cout << " Execute OpenCL kernels:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Execute native function:\t\t\t " << ((*i).getInfo<CL_DEVICE_EXECUTION_CAPABILITIES>() & CL_EXEC_NATIVE_KERNEL ? "Yes" : "No") << std::endl; std::cout << " Queue properties:\t\t\t\t " << std::endl; std::cout << " Out-of-Order:\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Profiling :\t\t\t\t\t " << ((*i).getInfo<CL_DEVICE_QUEUE_PROPERTIES>() & CL_QUEUE_PROFILING_ENABLE ? "Yes" : "No") << std::endl; std::cout << " Platform ID:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PLATFORM>() << std::endl; std::cout << " Name:\t\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_NAME>().c_str() << std::endl; std::cout << " Vendor:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VENDOR>().c_str() << std::endl; std::cout << " Driver version:\t\t\t\t " << (*i).getInfo<CL_DRIVER_VERSION>().c_str() << std::endl; std::cout << " Profile:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_PROFILE>().c_str() << std::endl; std::cout << " Version:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_VERSION>().c_str() << std::endl; std::cout << " Extensions:\t\t\t\t\t " << (*i).getInfo<CL_DEVICE_EXTENSIONS>().c_str() << std::endl; } std::cout << std::endl << std::endl; } return EXIT_SUCCESS; }

bubu · ‎08-20-2009

The (*l_itDevice).getInfo<CL_DEVICE_NAME>().c_str() gives me these errors:

error C2770: invalid explicit template argument(s) for 'detail::Param_traits::Param_type cl::Device::getInfo(cl_int *) const'
1> G:\ATIStream\include\CL/cl.hpp(1206) : see declaration of 'cl::Device::getInfo'
1>error C2780: 'cl_int cl::Device::getInfo(cl_device_info,T *) const' : expects 2 arguments - 0 provided
1> G:\ATIStream\include\CL/cl.hpp(1175) : see declaration of 'cl::Device::getInfo'

Also, when I call the

cl::vector<cl::Device>::iterator l_itDevice;

for ( l_itDevice=l_stlDevices.begin(); .... )

I got this:

G:\ATIStream\include\CL/cl.hpp(688) : error C2352: 'cl::vector::iterator::begin' : illegal call of non-static member function
1>        with
1>        [
1>            T=cl::Device
1>        ]
1>        G:\ATIStream\include\CL/cl.hpp(635) : see declaration of 'cl::vector::iterator::begin'
1>        with
1>        [
1>            T=cl::Device
1>        ]
1>        G:\ATIStream\include\CL/cl.hpp(687) : while compiling class template member function 'cl::vector::iterator cl::vector::begin(void)'
1>        with
1>        [
1>            T=cl::Device
1>        ]
1>        G:\ATIStream\include\CL/cl.hpp(1348) : see reference to class template instantiation 'cl::vector' being compiled
1>        with
1>        [
1>            T=cl::Device
1>        ]

omkaranathan · ‎08-28-2009

The error is due to some missing enums, which is expected to be implemented in next release.

ruysch · ‎01-30-2010

Originally posted by: thatguymike Khronos is working on an ICD model much like OpenGL so that you will only link against a standard library which the vendors will then plugin to.

Any news on the roadmap... multivendor support is sort of crucial for a wide spread usage of OpenCL;

nou · ‎01-30-2010

both nVidia and AMD now support ICD model so application will run on both platform without any change.

for now what does not work is enumerate two or more platforms.

Spooky_ · ‎03-23-2010

Hello there. I am currently trying to get a test application to run on both ATi and NVidia systems. However, I am not sure which OpenCL.lib I have to use?

Does it matter if I use the one from the NVidia SDK or the one from the ATi SDK? The NVidia SDK OpenCL.lib seems to be much older (dated 25.08.2009) and I get unresolved external symbols (as if the lib wasn't even there) if I try to compile the AMD OpenCL tutorial.

omkaranathan · ‎03-23-2010

Hello there. I am currently trying to get a test application to run on both ATi and NVidia systems. However, I am not sure which OpenCL.lib I have to use?

Use the library depending on the system you are using. You wont be able to run your program on ATI card if you use Nvidia implimentation and vice-versa.

Does it matter if I use the one from the NVidia SDK or the one from the ATi SDK? The NVidia SDK OpenCL.lib seems to be much older (dated 25.08.2009) and I get unresolved external symbols (as if the lib wasn't even there) if I try to compile the AMD OpenCL tutorial.

You should be able to run the OpenCL program with any implementation(You just have to recompile the program). Make sure that your program is getting correct libraries.

Spooky_ · ‎03-23-2010

I thought the whole point of the ICD model was, so that the GPU vendors plug in to the function calls with their OpenCL dlls? Just like in OpenGL.

I am currently running a test application both on ATi and NVidia system with the OpenCL.lib from the ATi Stream SDK. It runs without any errors with CL_DEVICE_TYPE_GPU. However with CL_DEVICE_TYPE_CPU I get a context creation error on NVidia systems.

omkaranathan · ‎03-23-2010

I am currently running a test application both on ATi and NVidia system with the OpenCL.lib from the ATi Stream SDK. It runs without any errors with CL_DEVICE_TYPE_GPU. However with CL_DEVICE_TYPE_CPU I get a context creation error on NVidia systems.

Nvidia implementation does not support CPUs. Do you have the ATIStream SDK installed in your Nvidia system?

Spooky_ · ‎03-23-2010

Ah ok. So using the ATi SDK OpenCL.lib should work fine for any purpose, since the implementation is dynamically loaded via the vendor's DLL anyway?

Yes, I installed the Stream SDK (only the SDK without the other 2 things) on an NVidia system in order to use its OpenCL.lib

omkaranathan · ‎03-23-2010

The name of both the dlls are same. You have to ensure that the correct .dll(StreamSDK .dll) is in PATH.

Spooky_ · ‎03-23-2010

The vendor specific OpenCL.dll from the driver is stored in \windows\system32 (for 32bit applications). That's the one that is used by the application.

The SDK specific .dlls will only be used through the PATH variable, if no .dll is found in the working directory or the system directory.

When testing the application with dependency walker, it shows that the correct OpenCL.dll is used (the one from the NVidia driver in system32).

Btw. the SDK specific DLLs are not named the same. The ATi one is named atiocl.dll for example.

As I said, the application runs fine on the NVidia system, even though I used the ATi OpenCL.lib. I was simply wondering if I would still encounter problems if I only use the OpenCL.lib from ATi, regardless of where the executable is run. Or if I still have to use different build targets with different OpenCL.libs, since there is in fact no standardized library available yet.

omkaranathan · ‎03-23-2010

When testing the application with dependency walker, it shows that the correct OpenCL.dll is used (the one from the NVidia driver in system32).

If you want to run the program in CPU , you have to use the OpenCL.dll which comes with StreamSDK.

Btw. the SDK specific DLLs are not named the same. The ATi one is named atiocl.dll for example.

The .dll file names are same. StreamSDK contains OpenCL.dll which is installed in system32 folder.

Spooky_ · ‎03-23-2010

Originally posted by: omkaranathan

If you want to run the program in CPU , you have to use the OpenCL.dll which comes with StreamSDK.

Ah, right . I was only wondering about that, I don't really need it anyway. If the NVidia implementation does not support it (yet) it's fine.

Originally posted by: omkaranathan

The .dll file names are same. StreamSDK contains OpenCL.dll which is installed in system32 folder.

Are you sure about that? As far as I know the OpenCL.dll in the system32 folder is installed by the video driver. The StreamSDK has the atiocl.dll in the bin directory, which is added to the PATH variable.

// woops, quoting errors

Spooky_ · ‎03-23-2010

I just checked, the OpenCL.dll in system32 is definitely the one from the NVidia driver, not the one from the StreamSDK.

http://dl.dropbox.com/u/2309215/opencl.png

omkaranathan · ‎03-23-2010

Are you sure about that? As far as I know the OpenCL.dll in the system32 folder is installed by the video driver. The StreamSDK has the atiocl.dll in the bin directory, which is added to the PATH variable.

Yes, both have OpenCL.dll, StreamSDK has atiocl.dll too. Anyways, you dont require it in your Nvidia system if you are not going to use CPU implementation.

_Big_Mac_ · ‎03-23-2010

With NVIDIA's newest dev drivers (197.13) and AMD SDK 2.0.1 we can finally see both platforms

CLInfo.exe says:

Number of platforms: 2
Plaform Profile: FULL_PROFILE
Plaform Version: OpenCL 1.0 CUDA 3.0.1
Plaform Name: NVIDIA CUDA
Plaform Vendor: NVIDIA Corporation
Plaform Extensions: cl_khr_byte_addressable_store cl_khr_ic
d cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing c
l_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
Plaform Profile: FULL_PROFILE
Plaform Version: OpenCL 1.0 ATI-Stream-v2.0.1
Plaform Name: ATI Stream
Plaform Vendor: Advanced Micro Devices, Inc.
Plaform Extensions: cl_khr_icd

Plaform Name: NVIDIA CUDA
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4318
Max compute units: 16
Max work items dimensions: 3
Max work items[0]: 512
Max work items[1]: 512
Max work items[2]: 64
Max work group size: 512
Preferred vector width char: 1
Preferred vector width short: 1
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 0
Max clock frequency: 1625Mhz
Address bits: 32
Max memeory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4352
Alignment (bits) of base address: 256
Minimum alignment (bytes) for any datatype: 16
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 519634944
Constant buffer size: 65536
Max number of constant args: 9
Local memory type: Scratchpad
Local memory size: 16384
Profiling timer resolution: 1000
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 000000000242FEF0
Name: GeForce 8800 GTS 512
Vendor: NVIDIA Corporation
Driver version: 197.13
Profile: FULL_PROFILE
Version: OpenCL 1.0 CUDA
Extensions: cl_khr_byte_addressable_store c
l_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_s
haring cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics

Plaform Name: ATI Stream
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 2500Mhz
Address bits: 64
Max memeory allocation: 1073741824
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: No
Round to +ve and infinity: No
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 65536
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 000000000364F598
Name: Pentium(R) Dual-Core CPU
E5200 @ 2.50GHz
Vendor: GenuineIntel
Driver version: 1.0
Profile: FULL_PROFILE
Version: OpenCL 1.0 ATI-Stream-v2.0.1
Extensions: cl_khr_icd cl_khr_global_int32_
base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomic
s cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_ext
ended_atomics cl_khr_byte_addressable_store

This doesn't require having any OpenCL.dll in your project's folder, the ICD's set up everything in the PATH and it just works.

lsolano · ‎07-26-2010

Originally posted by: _Big_Mac_ With NVIDIA's newest dev drivers (197.13) and AMD SDK 2.0.1 we can finally see both platforms

I have NVIDIA 3.1.1 (256.35) and AMD SDK 2.0.0, but I cannot see both platforms, I can see either the NVIDIA or ATI platform depending on the library used to compile the program.

for ATI lib I compile using: g++ -I/usr/local/ati/include -L/usr/local/ati/lib/x86_64 info.cpp -lOpenCL

for NVIDIA lib I compile using: g++ -I/usr/local/ati/include -L/usr/lib64 info.cpp -lOpenCL

Is this because I need to upgrade to AMD SDK 2.0.1 ? or am I missing some configuration ? Any help will be appreciated...

Thanks

nou · ‎07-26-2010

yes try upgrade to ATI Stream SDK 2.1. there was some changes in ICD registration.

lsolano · ‎07-26-2010

Thanks Nou, gonna try that and let you know