cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

ssuarezbe
Journeyman III

OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

Hi.

I'm just a novice in the OpenCL world and before start coding real programs I have just do some basic code to detect the available devices in my computer.

To do that I wrote this code.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

#define __CL_ENABLE_EXCEPTIONS

#include <CL/cl.hpp>

#include <iostream>

// Get device type as string

char* getDevTypeString(cl_device_type type)

{

          switch(type)

          {

          case CL_DEVICE_TYPE_CPU:

                    return "CPU";

                    break;

          case CL_DEVICE_TYPE_GPU:

                    return "GPU";

                    break;

          case CL_DEVICE_TYPE_ACCELERATOR:

                    return "ACCELERATOR";

                    break;

          default:

                    return "DEFAULT";

                    break;

          }

}

int main()

{

    /* Host/device data structures */

    cl_platform_id platform;

    cl_device_id *devices;

    cl_uint num_devices, addr_data,comp_units;

    cl_int err;

    cl_device_type deviceType;

    char name_data[48];

    /* Identify a platform */

    err = clGetPlatformIDs(1, &platform, NULL);

    if(err < 0)

    {

        std::cerr<<"Couldn't find any platforms "<<err;

        exit(1);

    }

    /* Determine number of connected devices */

    err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, (cl_uint)0, NULL, &num_devices);

    if(err < 0)

    {

        std::cerr<<"Couldn't find any devices "<<err;

        exit(1);

    }

    /* Access connected devices */

    devices = (cl_device_id*)malloc(sizeof(cl_device_id) * num_devices);

    clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,num_devices, devices, NULL);

    /* Obtain data for each connected device */

    for (int i=0;i<num_devices;i++){

        /*Device proporties list

           http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clGetDeviceInfo.html

        */

        err = clGetDeviceInfo(devices, CL_DEVICE_NAME,sizeof(name_data), name_data, NULL);

        if(err < 0) {

            std::cerr<<"Couldn't read extension data";

            exit(1);

        }

        clGetDeviceInfo(devices, CL_DEVICE_ADDRESS_BITS,sizeof(addr_data), &addr_data, NULL);

        clGetDeviceInfo(devices, CL_DEVICE_EXTENSIONS,sizeof(ext_data), ext_data, NULL);

        clGetDeviceInfo(devices, CL_DEVICE_MAX_COMPUTE_UNITS,sizeof(comp_units), &comp_units, NULL);

        clGetDeviceInfo(devices, CL_DEVICE_TYPE,sizeof(deviceType), &deviceType, NULL);

        std::cout<<"NAME: "<<name_data<<"\nType: "<<getDevTypeString(deviceType)

                <<"\nADDRESS_WIDTH:"<<addr_data<<"\nCompute Units:"<<comp_units<<std::endl;

        std::cout<<"--------------------------------------------------------------------------------"<<std::endl;

    }

    free(devices);

    return 0;

}

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

***** The problem is that the program returns:

NAME: Devastator

Type: GPU

ADDRESS_WIDTH:32

Compute Units:4

--------------------------------------------------------------------------------

NAME: AMD A8-5600K APU with Radeon(tm) HD Graphics  

Type: CPU

ADDRESS_WIDTH:64

Compute Units:4

--------------------------------------------------------------------------------

Which is wrong because the number of "Compute Units" for the GPU should be 256.

I have already update the driver  to Catalyst 13.04 and installed the AMD SDK 2.8.1 .

My operating System is Ubuntu 12.04.

Did I do something wrong? Please help!!

Tags (2)
0 Kudos
Reply
1 Solution

Accepted Solutions
nou
Exemplar

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

at any time your GPU is running 4 wavefronts which is GPU equivalent of CPU thread. each wavefront consist from 4x16 workitems that are executed in waves. so if you start kernel with 256 workitems it is minimum size that will 100% utilize GPU.

each compute units is executing one instruction on 16 workitems at the same time. it takes 4 cycles to execute this instruction over 64 workitems in workgroup. each instruction is VLIW4 type which means that it can compute four operations at the same time. like add two float4 vectors at once.

View solution in original post

0 Kudos
Reply
6 Replies
nou
Exemplar

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

no that foru units are correct. each compute unit contain 16 stream cores which are 4 units wide.

0 Kudos
Reply
ssuarezbe
Journeyman III

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

Then, if I want to create n-threads (parallel kernels), the max value is 4? How can I use all the 256 stream cores?.

I mean if I start 256 kernels each one will use each stream processor?

0 Kudos
Reply
wayne_static
Adept II

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

The compute unit is a term in OpenCL that refers to the physical streaming multiprocessors in a device of which your A8-5600K APU has got 4 so there is nothing wrong with the output. You can confirm this for yourself by looking at the specs on the AMD product page. In the GCN architecture, each compute unit consists of 64 processing elements (PE) and your A8-5600K APU has 256 PEs in total, implying 256/64 = 4 compute units.

0 Kudos
Reply
nou
Exemplar

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

at any time your GPU is running 4 wavefronts which is GPU equivalent of CPU thread. each wavefront consist from 4x16 workitems that are executed in waves. so if you start kernel with 256 workitems it is minimum size that will 100% utilize GPU.

each compute units is executing one instruction on 16 workitems at the same time. it takes 4 cycles to execute this instruction over 64 workitems in workgroup. each instruction is VLIW4 type which means that it can compute four operations at the same time. like add two float4 vectors at once.

View solution in original post

0 Kudos
Reply
nou
Exemplar

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

IIRC the A8-5600K APU have VLIW4 architecture GPU.

0 Kudos
Reply
wayne_static
Adept II

Re: OpenCL do not detect the number of APU CL_DEVICE_MAX_COMPUTE_UNITS correctly

Jump to solution

Your kernels are mapped to be executed by the processing elements in the compute units so when execution starts, all 256 of them will be utilized.

0 Kudos
Reply