cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

zypo
Journeyman III

Device queues

I don't mean to hijack your thread, but I'm having problems w/ clCreateCommandQueueWithProperties.  Anytime I pass ANY properties through, the function returns CL_INVALID_QUEUE_PROPERTIES.  I would really like to turn on out of order queues.


I am using a Radeon 290.

I'm using the 3.0-0 Beta SDK

I've tried all the newest drivers including beta.

I am running into other problems as well with kernel errors that arn't supposed to exist in OpenCL 2.0 such as it doesn't know what the ndrange_1D(...) function is... this might be related.

Any help would be greatly appreciated!

0 Likes
15 Replies
dipak
Big Boss

Please ensure that you are passing the right arguments to clCreateCommandQueueWithProperties API. To check the usage, you may check the OpenCL2.0 samples in APP SDK 3.0.Beta or may refer clCreateCommandQueueWithProperties.

For example,


// A host command queue


cl_queue_properties props[] = {CL_QUEUE_PROPERTIES, CL_QUEUE_PROFILING_ENABLE, 0};


hostCommandQueue = clCreateCommandQueueWithProperties(context, deviceId, props, &status);




// A device command queue


cl_queue_properties prop[] = {  CL_QUEUE_PROPERTIES, CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE|CL_QUEUE_ON_DEVICE_DEFAULT,  CL_QUEUE_SIZE, maxQueueSize, 0 };


deviceCommandQueue = clCreateCommandQueueWithProperties(context, deviceId, prop, &status);




Now, coming to your kernel related problem. As per the clBuildProgram API, applications are required to specify the –cl-std=CL2.0 option if they want to compile or build their programs with OpenCL C 2.0. Otherwise i.e. if the –cl-std build option is not specified, the highest OpenCL C 1.x language version supported by each device is used when compiling the program for each device. So, please ensure that you have used that flag correctly.

If you still face the problem, please provide more details such OS, exact driver version, which type of command queue etc.. A sample code would be very helpful for us.

Regards,

0 Likes
zypo
Journeyman III

Thanks Dipak... both of your recommendations worked getting things started.  I have another issue involving enqueue_kernel.

There are 2 blocks of code below.  They are supposed to do the same thing, but they don't.  The iterative method below works, whereas the parallel enqueued method doesn't fully work (I can still see SOME of the kernel instances are running via  flickers on the screen - but each frame different kernel instances are showing up)

I'm thinking it may have to do with a limitation on the device queue, but I'm a noob so what do I know.

Maybe there is an OpenCLism that I don't know about that may help get these kernels working.

In my c++, I am enqueuing via the hostCommandQueue, but in order to prevent a kernel crash, I also create a default device queue (via c++) that just sits there doing nothing.

Here is my kernel code:

     //Parallel method that doesn't work.

  enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP, ndrange_1D(numInstances),

  ^{

      int gid = get_global_id(0);

      runKernel(gid, var1, var2, var3);

  });

          //Iterative method that works just like it should

  for (int index = 0; index < numInstances; index++)

       runKernel(index, var1, var2, var3);

Thanks!

0 Likes

It is difficult from your post to follow what your code is trying to do. It would be helpful if you can bring more clarity to it

0 Likes
zypo
Journeyman III

This is my kernel. I know it is not optimized or really does anything, but it illustrates my problem in a small amount of code.  After the kernel runs, I display the buffer on the screen.  It should be solid white (the iterative method does), but when I use the device enqueue, I get a bunch of random white lines which is not supposed to happen. 

Thanks!

void plotPoint(uchar* graph, int bufferX, int bufferY, unsigned int bufferWidth, unsigned int bufferHeight)

{

  //convert those coordinates to a mem location and write color to screen

  if (bufferX < 0 || bufferY < 0 || bufferX > bufferWidth || bufferY > bufferHeight)

  return;

  long graphIndex = (bufferY * bufferWidth + bufferX) * 4;

  for (int index = 0; index < 4; index++)

  graph[graphIndex+index] = 255;

}

//graph is a buffer of size 4*bufferWidth*bufferHeight

//There is bufferWidth*bufferHeight instances running.

__kernel void testKernel(__global uchar *graph, uint bufferWidth, uint bufferHeight)

{

  // thread index and total

  int gid = get_global_id(0);

  int x = gid % bufferWidth;

  int y = gid / bufferWidth;

  if (x == 0) //only allow 1/bufferWidth kernels to run.

  {

  //this doesn't work

  enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP, ndrange_1D(bufferWidth),

  ^{

  int gid = get_global_id(0);

  plotPoint(graph, gid, y, bufferWidth, bufferHeight);

  });

  //this does

  //for (int index = 0; index < bufferWidth; index++)

  // plotPoint(graph, index, y, bufferWidth, bufferHeight);

  }

}

0 Likes

As I checked with a wrapper code, you kernel code having enqueue_kernel worked fine on my Kaveri m/c. I've attached the sample wrapper code here. Please check at your end. If you still face the problem, please provide your host code as well as share your setup details.

Regards,

0 Likes
zypo
Journeyman III

Dipak,

First and foremost, I REALLY appreciate your help with this issue of mine.  I see that you went out of your way to create test files for myself, and it does not go unnoticed... so thank you very much!

On the other hand, I compiled and ran your software, and your code gives me ##########FAILED########, so I now assume that it is not my code giving the issue.  I'm thinking about re-installing drivers (again). 

I'm using a discrete R9 290 on Windows 7 w/ the 3.0 beta OpenCL install.  I do have a A10-5700 cpu, but it won't run OCK2.0 code, so my gpu has to work.

I looked more into my kernel and the documentation and noticed that if I supply the '-g' to build options for your kernel, it gives more verbose errors.  enqueue_kernel is returning CLK_ENQUEUE_FAILURE, but even w/ the -g option, it does not give me any more information.

Other then driver changes, do you know of anything else I can try?

Thanks!

0 Likes

Thanks for your appreciation . We always try our best to provide support to our users or customers.

Don't know whether its a driver/hardware specific issue or not. I'll try the same on a R9 290 card and share my observation with you.

Regards,

0 Likes

This seems to be a driver issue. Last time, I used an internal driver package (higher version than public one) where it worked fine. When I tried with public catalyst driver (15.4 or 14.502) on R9 290X card, I was able to reproduce the issue. I'll check further and get back to you shortly.

Regards,

0 Likes
zypo
Journeyman III

So this is good news! I'll be standing by.

0 Likes

It is indeed a driver issue. It is not reproducible under latest internal build. Hope you'll get that working version with future catalyst release. Till then, please keep patience.

Regards,

0 Likes

@dipak - feel free to fork a thread for this but I was not able to create device side queues with OpenCL 1.2 friendly:


cl_queue_properties props[] = {  CL_QUEUE_PROPERTIES, CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE|CL_QUEUE_ON_DEVICE_DEFAULT, 0};



On my Tahiti/7970, this returns failure from clCreateCommandQueueWithProperties with error = -6 (CL_OUT_OF_HOST_MEMORY).

What gives? Upon closer inspection - I see the following for both 7970 and my M290s:


















Queue on Device properties:                    
Out-of-Order:                          No
Profiling :                            No


I thought these devices had multiple ACEs?  Or was this not all there is to supporting out of order execution/device queues for AMD?

Also, presuming one has out-of-order device side queus, is profiling supported on them in AMDs framework?  Not having that makes it a little difficult to compare (via numbers) host vs device queues.

0 Likes

Hi Jason,

Creating device-side queue requires OpenCL 2.0 supported devices. If you check the clinfo output, it should report the "Device OpenCL C version" as "OpenCL C 2.0" for that particular device. Tahiti(7970) and M290x don't have the OpenCL 2.0 support. Currently OpenCL 2.0 works on cards having support of GCN 1.1 or greater.

Yes,  profiling on device side queue is supported on OpenCL 2.0 supported device. Here, is an example of clinfo output for Hawaii (R9 290X) device:



Queue on Device properties:


    Out-of-Order:                                Yes


    Profiling :                                     Yes


  Platform ID:                                   000007FED8B37B60


  Name:                                           Hawaii


  Vendor:                                          Advanced Micro Devices, Inc.


  Device OpenCL C version:             OpenCL C 2.0


Regards,

0 Likes

So Out of order queues, another 1.0 OpenCL feature doesn't work on anything but 2.0 / GCN 1.1 hardware?  That's a pretty shocking thing - another one of those things that's not been working right from 1.0 all these years is amazing.

There are no plans to make this work on GCN 1.0?  Is there something fundamental missing in the hardware?

Can you confirm for me the GCN architecture of the M290x?  There's very little documentation on mobile chipsets.  Pretty funny how mobile chipsets, yesteryears achitectures have so little in common with their desktop counterparts when grouping by name.

And maybe an incredibly helpful question to others: are there any GCN 1.2  laptop cards?  Kinda seems like laptops are shafted on software for years to come.

0 Likes

Regarding GCN > 1.0 on mobile GPUs:

[Beware of numbering confusion, some press labels the generations (GCN 1.0, GCN 1.1, GCN 1.2),

while AMD's docs label the third generation "Graphics Core Next, Generation 3".]

0 Likes

As you know, according to OpenCL spec, out-of-order host queues are not mandatory one. AFAIK, they are not currently supported on AMD platforms. Whereas out-of-order device queues are mandatory and they are supported by any OpenCL 2.0 enabled devices.

Its really hard to say what prevents GCN 1.0 devices to support OpenCL 2.0 and even, whether they will be supported or not in future. I'm really not aware of such.

For mobile GPUs, I really thank gc9 for sharing those useful information and links.

Regards,

0 Likes