Is there any way to query the capabilities of an OpenCL compute unit, such as the number of processing elements contained?
For example, I have 2 OpenCL devices, with the following properties:
Type = GPU, Frequency = 550 MHz, Compute Units = 5, Max work item size = (256, 256, 256), Max work group size = 256
Type = CPU, Frequency = 1600 MHz, Compute units = 8, Max work item size = (1024, 1024, 1024), Max work group size = 1024
Without taking into account the properties of the compute units, it would seem that the CPU would be the better device to use, but that's not taking into account the vast differences between the number of processing elements in the CPU (4 single-precision?) + GPU (80 single-precision?).
Another question: when you have a GPU + CPU combined (Fusion?), then would this be seen by OpenCL as one device, with compute units of varying types/ number of processing elements, or as 2 separate devices?
Another question: when you have a GPU + CPU combined (Fusion?), then would this be seen by OpenCL as one device, ...
I was just wondering how to do this. I really want to add this feature for one of my opencl programs, but don't know where to start. I could potentially use a loop, looping over all available devices, building command queues and kernels for each device, launching each kernel and monitoring untill all complete. But this sounds a bit complicated. Any shortcuts? will using CL_DEVICE_TYPE_ALL when creating the context take care all of this?
Any examples is appreciated.
(btw: I am not using ATI's binding, because some portability issues)
Originally posted by: MicahVillmow FangQ, If you create a program that works with using a CPU and discrete GPU, it should work in the same manner, but with different performance characteristics, on future integrated parts.
I am not sure I understand your comment. Yes, my code worked for CPU and GPU separately (the host code choose a device based on user's command line input, and build program/launch kernel for this device). But now, I want to run the kernel on CPU+GPU (i.e. multiple devices) simultaneously. Must I use a loop structure to repeat my host code for each device? or there are shotcuts in OpenCL to achieve the same?
Originally posted by: MicahVillmow FangQ, There currently is no load balancing of kernel code across multiple devices. So you would need to manage this yourself.
thanks for your quick reponse. At this point, I am not getting to load-balancing yet. I was just asking if there is an easy way to launch a kernel for multiple devices (CPU+GPU1+GPU2+...) simultaneously.
(I think the OP's question is somewhat related to load-balancing though, sorry for hijacking the thread)
Originally posted by: email@example.com Is there any way to query the capabilities of an OpenCL compute unit, such as the number of processing elements contained?