cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

evk8888
Journeyman III

device fission using clCreateSubDevicesEXT (command queue creation fails)

hi,

In my experiments i m trying to partition a cpu to 2 subdevices with 2 cores each and for the first subdevice the command queue is created and i can enqueue tasks into it but when I am trying to create command queues for devices_id [1] it says (failed to create command queue)

globally declared - static clCreateSubDevicesEXT_fn pfn_clCreateSubDevicesEXT = NULL;

pfn_clCreateSubDevicesEXT = (clCreateSubDevicesEXT_fn)clGetExtensionFunctionAddress("clCreateSubDevicesEXT");

const cl_context_properties part_props[] = { CL_DEVICE_PARTITION_BY_COUNTS_EXT, 2, 2, CL_PARTITION_BY_COUNTS_LIST_END_EXT, CL_PROPERTIES_LIST_END_EXT };

err =  pfn_clCreateSubDevicesEXT(device_id, (const cl_device_partition_property_ext *)part_props, 2, devices_id, &part_count);

printf("PARTITION SUCCESSSSSSS????? ERRR VALUE! (%d)\n", err);

         for (int i =0; i<2; i++) {

        err= clGetDeviceInfo(devices_id, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(val), &val, NULL);

        printf("Device partition %d: max compute units = %lld\n",i, val );}

        cl_context_properties context_props[3];

context_props[0] = (cl_context_properties)CL_CONTEXT_PLATFORM;      // indicates that next element is platform

context_props[1] = (cl_context_properties)platforms[0];  // platform is of type cl_platform_id

        context_props[2] = (cl_context_properties)0;                        // last element must be 0

         context = clCreateContext(context_props, 1, devices_id, NULL, NULL, &err);

        if (!context)

        {

           printf("Error: Failed to create a compute context!\n");

        }

        commands = clCreateCommandQueue(context, devices_id[1], NULL, &err);

        if (!commands)

        {

           printf("Error: Failed to create a command queue commands!\n");

        }

        can anyone help me out with this....

Thanks!

0 Likes
1 Solution
harryjiang
Adept I

Hi evk8888,

         context = clCreateContext(context_props, 1, devices_id, NULL, NULL, &err);

It may because you use 1 for the num_devices when create context. Please try 2 instead of 1.

View solution in original post

0 Likes
10 Replies
harryjiang
Adept I

Hi evk8888,

         context = clCreateContext(context_props, 1, devices_id, NULL, NULL, &err);

It may because you use 1 for the num_devices when create context. Please try 2 instead of 1.

0 Likes

hello Ming,

Thanks a lot for your reply.. command queues are getting created now. I have one more question.

I have made 4 subdevices with each 4 cores and their corresponding queues. but the utilization of the CPU still remains to 1 core. do you know this is happening... is there any possibility to know where i am going wrong...

the code compiles and gives me correction solution but the utilization does not improve more than 1 core..

looking forward for your reply...

Thanks a lot!!

0 Likes

Hi,

I made changes so the utilization is working correctly.. but when i have a multi threaded program where it calls the kernel as follows

err = clEnqueueTask((myProc)->queues[_clId], kernel, 0, NULL, &k5);

      if (err!=CL_SUCCESS)

      {

              printf("Error: Failed to execute kernel!%d\n", err);

      }

     clFlush ((myProc)->queues[_clId]);

     err = clWaitForEvents(1, &k5);

if the clwait is called after the enqueue and having 4 sub devices it still uses only 1 core of the CPU. if the wait is removed it uses more... but i need to have a wait for making it synchronized with other kernels of the application. is there any other way for doing this, also i m using opencl 1.1. does clcreatesubdevices working with opencl 1.2 if so i can install and try it normally without using ext function with it... do you have some suggestions...

thanking you!!

looking forward for your reply!!

0 Likes

clEnqueueTask is typically used for single-threaded tasks and the task occupy a whole compute unit for this single thread. How did you measure the utilization of the CPU cores?

Harry

0 Likes

hi,

I checked it with top in the terminal....It seems like all the threads are sharing the same core for execution... its shows 100% CPU not more... (4 threads sharing it when i use top in thread mode.)

thanks...

0 Likes

I use 4 sub-devices, each of them has one core. Create 4 command queues for each sub-device. For 4 threads, I can see they run on 4 cores from task manager apparently.

context = clCreateContext(context_props, 4, devices_id_part, NULL, NULL, &status);

        if (!context)

        {

           printf("Error: Failed to create a compute context!\n");

        }

  commands1 = clCreateCommandQueue(context, devices_id_part[0], NULL, &status);

  commands2 = clCreateCommandQueue(context, devices_id_part[1], NULL, &status);

  commands3 = clCreateCommandQueue(context, devices_id_part[2], NULL, &status);

  commands4 = clCreateCommandQueue(context, devices_id_part[3], NULL, &status);

Create four threads, in each thread:

status = clEnqueueTask(command_queue,kernel,0,NULL,clevent);

assert(status == CL_SUCCESS);

status = clFlush(command_queue);

assert(status == CL_SUCCESS);

status = clWaitForEvents(1,clevent);

assert(status == CL_SUCCESS);

I use the code just like you said, and it works fine. It seems the clWaitForEvents does not affect the core usage.

Thanks a lot for your reply... i think i m doing something wrong somewhere...i will debug the code and try to find where i m going wrong.....anyway thanks a lot for your help...

also 1 more query

I m trying to use the benchmarks of amd in the sdk.. but most of them use clenqueueNDRange for execution.. and the kernels are also coded accordingly..

Is it possible to enqueue kernels using clenqueueTask and keep the kernel code as such and still work correctly??

or should i need to change it accordingly...for 1 core like removing stuff like...get_global_id(0)

Thanks...

0 Likes

yes, you are right. If use clenqueueTask instead of clenqueueNDRange, the value of get_global_id() will be meaningless. You have to change it accordingly.

0 Likes

Thanks...i changed the code according to single core execution...and it works correctly...

also is there anyway to set the number of threads created by opencl runtime?? any environment variable??

thanks...

0 Likes

clEnqueueNDRangeKernel could, the number of threads depends on the global work size.

clEnqueueTask only work when global work size is 1, so it can not be divided into multi-thread by opencl. But you can create more host threads to implement the management.

0 Likes