15 Replies Latest reply on May 19, 2015 2:27 AM by dipak Branched from an earlier discussion.

    Device queues

    zypo

      I don't mean to hijack your thread, but I'm having problems w/ clCreateCommandQueueWithProperties.  Anytime I pass ANY properties through, the function returns CL_INVALID_QUEUE_PROPERTIES.  I would really like to turn on out of order queues.


      I am using a Radeon 290.

      I'm using the 3.0-0 Beta SDK

      I've tried all the newest drivers including beta.

       

      I am running into other problems as well with kernel errors that arn't supposed to exist in OpenCL 2.0 such as it doesn't know what the ndrange_1D(...) function is... this might be related.

       

      Any help would be greatly appreciated!

        • Re: Device queues
          dipak

          Please ensure that you are passing the right arguments to clCreateCommandQueueWithProperties API. To check the usage, you may check the OpenCL2.0 samples in APP SDK 3.0.Beta or may refer clCreateCommandQueueWithProperties.

           

          For example,

           

          // A host command queue

          cl_queue_properties props[] = {CL_QUEUE_PROPERTIES, CL_QUEUE_PROFILING_ENABLE, 0};

          hostCommandQueue = clCreateCommandQueueWithProperties(context, deviceId, props, &status);


          // A device command queue

          cl_queue_properties prop[] = {  CL_QUEUE_PROPERTIES, CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE|CL_QUEUE_ON_DEVICE_DEFAULT,  CL_QUEUE_SIZE, maxQueueSize, 0 };

          deviceCommandQueue = clCreateCommandQueueWithProperties(context, deviceId, prop, &status);

           

          Now, coming to your kernel related problem. As per the clBuildProgram API, applications are required to specify the –cl-std=CL2.0 option if they want to compile or build their programs with OpenCL C 2.0. Otherwise i.e. if the –cl-std build option is not specified, the highest OpenCL C 1.x language version supported by each device is used when compiling the program for each device. So, please ensure that you have used that flag correctly.

           

          If you still face the problem, please provide more details such OS, exact driver version, which type of command queue etc.. A sample code would be very helpful for us.

           

          Regards,

            • Re: Device queues
              zypo

              Thanks Dipak... both of your recommendations worked getting things started.  I have another issue involving enqueue_kernel.

              There are 2 blocks of code below.  They are supposed to do the same thing, but they don't.  The iterative method below works, whereas the parallel enqueued method doesn't fully work (I can still see SOME of the kernel instances are running via  flickers on the screen - but each frame different kernel instances are showing up)

              I'm thinking it may have to do with a limitation on the device queue, but I'm a noob so what do I know.

               

              Maybe there is an OpenCLism that I don't know about that may help get these kernels working.

               

              In my c++, I am enqueuing via the hostCommandQueue, but in order to prevent a kernel crash, I also create a default device queue (via c++) that just sits there doing nothing.

               

              Here is my kernel code:

               

                   //Parallel method that doesn't work.

                enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP, ndrange_1D(numInstances),

                ^{

                    int gid = get_global_id(0);

                    runKernel(gid, var1, var2, var3);

                });

               

                        //Iterative method that works just like it should

                for (int index = 0; index < numInstances; index++)

                     runKernel(index, var1, var2, var3);

               

               

              Thanks!

                • Re: Device queues
                  dipak

                  It is difficult from your post to follow what your code is trying to do. It would be helpful if you can bring more clarity to it

                    • Re: Device queues
                      zypo

                      This is my kernel. I know it is not optimized or really does anything, but it illustrates my problem in a small amount of code.  After the kernel runs, I display the buffer on the screen.  It should be solid white (the iterative method does), but when I use the device enqueue, I get a bunch of random white lines which is not supposed to happen. 

                       

                      Thanks!

                       

                      void plotPoint(uchar* graph, int bufferX, int bufferY, unsigned int bufferWidth, unsigned int bufferHeight)

                      {

                        //convert those coordinates to a mem location and write color to screen

                        if (bufferX < 0 || bufferY < 0 || bufferX > bufferWidth || bufferY > bufferHeight)

                        return;

                       

                        long graphIndex = (bufferY * bufferWidth + bufferX) * 4;

                        for (int index = 0; index < 4; index++)

                        graph[graphIndex+index] = 255;

                      }

                       

                       

                      //graph is a buffer of size 4*bufferWidth*bufferHeight

                      //There is bufferWidth*bufferHeight instances running.

                      __kernel void testKernel(__global uchar *graph, uint bufferWidth, uint bufferHeight)

                      {

                        // thread index and total

                        int gid = get_global_id(0);

                       

                       

                        int x = gid % bufferWidth;

                        int y = gid / bufferWidth;

                       

                        if (x == 0) //only allow 1/bufferWidth kernels to run.

                        {

                        //this doesn't work

                        enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP, ndrange_1D(bufferWidth),

                        ^{

                        int gid = get_global_id(0);

                        plotPoint(graph, gid, y, bufferWidth, bufferHeight);

                        });

                       

                        //this does

                        //for (int index = 0; index < bufferWidth; index++)

                        // plotPoint(graph, index, y, bufferWidth, bufferHeight);

                        }

                      }

                        • Re: Device queues
                          dipak

                          As I checked with a wrapper code, you kernel code having enqueue_kernel worked fine on my Kaveri m/c. I've attached the sample wrapper code here. Please check at your end. If you still face the problem, please provide your host code as well as share your setup details.

                           

                          Regards,

                            • Re: Device queues
                              zypo

                              Dipak,

                              First and foremost, I REALLY appreciate your help with this issue of mine.  I see that you went out of your way to create test files for myself, and it does not go unnoticed... so thank you very much!

                               

                              On the other hand, I compiled and ran your software, and your code gives me ##########FAILED########, so I now assume that it is not my code giving the issue.  I'm thinking about re-installing drivers (again). 

                               

                              I'm using a discrete R9 290 on Windows 7 w/ the 3.0 beta OpenCL install.  I do have a A10-5700 cpu, but it won't run OCK2.0 code, so my gpu has to work.

                               

                              I looked more into my kernel and the documentation and noticed that if I supply the '-g' to build options for your kernel, it gives more verbose errors.  enqueue_kernel is returning CLK_ENQUEUE_FAILURE, but even w/ the -g option, it does not give me any more information.

                               

                              Other then driver changes, do you know of anything else I can try?

                               

                              Thanks!

                                • Re: Device queues
                                  dipak

                                  Thanks for your appreciation . We always try our best to provide support to our users or customers.

                                  Don't know whether its a driver/hardware specific issue or not. I'll try the same on a R9 290 card and share my observation with you.

                                   

                                  Regards,

                                  • Re: Device queues
                                    dipak

                                    This seems to be a driver issue. Last time, I used an internal driver package (higher version than public one) where it worked fine. When I tried with public catalyst driver (15.4 or 14.502) on R9 290X card, I was able to reproduce the issue. I'll check further and get back to you shortly.

                                     

                                    Regards,

                                    • Re: Device queues
                                      dipak

                                      It is indeed a driver issue. It is not reproducible under latest internal build. Hope you'll get that working version with future catalyst release. Till then, please keep patience.

                                       

                                      Regards,

                            • Re: Re: Device queues
                              jason

                              @dipak - feel free to fork a thread for this but I was not able to create device side queues with OpenCL 1.2 friendly:

                               

                              cl_queue_properties props[] = {  CL_QUEUE_PROPERTIES, CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE | CL_QUEUE_ON_DEVICE|CL_QUEUE_ON_DEVICE_DEFAULT, 0};

                               

                              On my Tahiti/7970, this returns failure from clCreateCommandQueueWithProperties with error = -6 (CL_OUT_OF_HOST_MEMORY).

                               

                              What gives? Upon closer inspection - I see the following for both 7970 and my M290s:

                               

                              Queue on Device properties:                    
                              Out-of-Order:                          No
                              Profiling :                            No

                               

                              I thought these devices had multiple ACEs?  Or was this not all there is to supporting out of order execution/device queues for AMD?

                               

                              Also, presuming one has out-of-order device side queus, is profiling supported on them in AMDs framework?  Not having that makes it a little difficult to compare (via numbers) host vs device queues.

                                • Re: Device queues
                                  dipak

                                  Hi Jason,

                                  Creating device-side queue requires OpenCL 2.0 supported devices. If you check the clinfo output, it should report the "Device OpenCL C version" as "OpenCL C 2.0" for that particular device. Tahiti(7970) and M290x don't have the OpenCL 2.0 support. Currently OpenCL 2.0 works on cards having support of GCN 1.1 or greater.

                                   

                                  Yes,  profiling on device side queue is supported on OpenCL 2.0 supported device. Here, is an example of clinfo output for Hawaii (R9 290X) device:


                                  Queue on Device properties:

                                      Out-of-Order:                                Yes

                                      Profiling :                                     Yes

                                    Platform ID:                                   000007FED8B37B60

                                    Name:                                           Hawaii

                                    Vendor:                                          Advanced Micro Devices, Inc.

                                    Device OpenCL C version:             OpenCL C 2.0

                                   

                                  Regards,

                                    • Re: Device queues
                                      jason

                                      So Out of order queues, another 1.0 OpenCL feature doesn't work on anything but 2.0 / GCN 1.1 hardware?  That's a pretty shocking thing - another one of those things that's not been working right from 1.0 all these years is amazing.

                                       

                                      There are no plans to make this work on GCN 1.0?  Is there something fundamental missing in the hardware?

                                       

                                      Can you confirm for me the GCN architecture of the M290x?  There's very little documentation on mobile chipsets.  Pretty funny how mobile chipsets, yesteryears achitectures have so little in common with their desktop counterparts when grouping by name.

                                       

                                      And maybe an incredibly helpful question to others: are there any GCN 1.2  laptop cards?  Kinda seems like laptops are shafted on software for years to come.