6 Replies Latest reply on Oct 20, 2010 1:52 PM by d.a.a.

    Querying compute unit capabilities

    danbartlett@ntlworld.com

      Is there any way to query the capabilities of an OpenCL compute unit, such as the number of processing elements contained?

      For example, I have 2 OpenCL devices, with the following properties:

      Type = GPU, Frequency = 550 MHz, Compute Units = 5,  Max work item size =  (256, 256, 256), Max work group size = 256

      Type = CPU, Frequency = 1600 MHz, Compute units = 8, Max work item size = (1024, 1024, 1024), Max work group size = 1024

      Without taking into account the properties of the compute units, it would seem that the CPU would be the better device to use, but that's not taking into account the vast differences between the number of processing elements in the CPU (4 single-precision?) + GPU (80 single-precision?).

      Another question:  when you have a GPU + CPU combined (Fusion?), then would this be seen by OpenCL as one device, with compute units of varying types/ number of processing elements, or as 2 separate devices?

        • Querying compute unit capabilities
          FangQ

           

          Another question:  when you have a GPU + CPU combined (Fusion?), then would this be seen by OpenCL as one device, ...


          I was just wondering how to do this. I really want to add this feature for one of my opencl programs, but don't know where to start. I could potentially use a loop, looping over all available devices, building command queues and kernels for each device, launching each kernel and monitoring untill all complete. But this sounds a bit complicated. Any shortcuts? will using CL_DEVICE_TYPE_ALL when creating the context take care all of this?

          Any examples is appreciated.

          (btw: I am not using ATI's binding, because some portability issues)

          • Querying compute unit capabilities
            MicahVillmow
            FangQ,
            If you create a program that works with using a CPU and discrete GPU, it should work in the same manner, but with different performance characteristics, on future integrated parts.
              • Querying compute unit capabilities
                FangQ

                 

                Originally posted by: MicahVillmow FangQ, If you create a program that works with using a CPU and discrete GPU, it should work in the same manner, but with different performance characteristics, on future integrated parts.


                I am not sure I understand your comment. Yes, my code worked for CPU and GPU separately (the host code choose a device based on user's command line input, and build program/launch kernel for this device). But now, I want to run the kernel on CPU+GPU (i.e. multiple devices) simultaneously. Must I use a loop structure to repeat my host code for each device? or there are shotcuts in OpenCL to achieve the same?

              • Querying compute unit capabilities
                MicahVillmow
                FangQ,
                There currently is no load balancing of kernel code across multiple devices. So you would need to manage this yourself.
                  • Querying compute unit capabilities
                    FangQ

                     

                    Originally posted by: MicahVillmow FangQ, There currently is no load balancing of kernel code across multiple devices. So you would need to manage this yourself.


                    thanks for your quick reponse. At this point, I am not getting to load-balancing yet. I was just asking if there is an easy way to launch a kernel for multiple devices (CPU+GPU1+GPU2+...) simultaneously.

                    (I think the OP's question is somewhat related to load-balancing though, sorry for hijacking the thread)

                  • Querying compute unit capabilities
                    d.a.a.

                     

                    Originally posted by: danbartlett@ntlworld.com Is there any way to query the capabilities of an OpenCL compute unit, such as the number of processing elements contained?


                    +1