6 Replies Latest reply on Oct 5, 2011 2:40 PM by jprice

    Question about using APUs

    jprice
      Using both GPU and CPU components

      I'm using a system that gives the following platform information:

      Name      : AMD Accelerated Parallel Processing
      Vendor    : Advanced Micro Devices, Inc.
      Version   : OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213)
      Device #1 : BeaverCreek (Advanced Micro Devices, Inc.) [CL_DEVICE_TYPE GPU]
      Device #2 : AMD A8-3850 APU with Radeon(tm) HD Graphics (AuthenticAMD) [CL_DEVICE_TYPE_CPU]

      I'm able to use either of these two devices individually without a problem. However, if I try and use both together it fails (kernel doesn't execute at all on GPU, no error).

      Is there something I need to do to be able to fully utilise the APU, or am I misunderstanding something about how these systems work?

      Thanks.

        • Question about using APUs
          genaganna

           

          Originally posted by: jprice I'm using a system that gives the following platform information:

          Name      : AMD Accelerated Parallel Processing Vendor    : Advanced Micro Devices, Inc. Version   : OpenCL 1.1 AMD-APP-SDK-v2.5 (684.213) Device #1 : BeaverCreek (Advanced Micro Devices, Inc.) [CL_DEVICE_TYPE GPU] Device #2 : AMD A8-3850 APU with Radeon(tm) HD Graphics (AuthenticAMD) [CL_DEVICE_TYPE_CPU]

          I'm able to use either of these two devices individually without a problem. However, if I try and use both together it fails (kernel doesn't execute at all on GPU, no error).

          Is there something I need to do to be able to fully utilise the APU, or am I misunderstanding something about how these systems work?

          Thanks.

          Could you please run SimpleMultiDevice sample and paste output here?

          SimpleMultiDevice is shipped with SDK samples.

          Can you explain work flow of running on two devices?

            • Question about using APUs
              jprice

              $ ./SimpleMultiDevice
              ----------------------------------------------------------
              CPU + GPU Test 1 : Single context Single Thread
              ----------------------------------------------------------
              Total time : 72
              Time of CPU : 68.8067
              Time of GPU : 4.40075
              ----------------------------------------------------------
              CPU + GPU Test 2 : Multiple context Single Thread
              ----------------------------------------------------------
              Total time : 72
              Time of CPU : 71.3133
              Time of GPU : 4.40083
              ----------------------------------------------------------
              CPU + GPU Test 3 : Multiple context Multiple Thread
              ----------------------------------------------------------
              Total time : 73
              Time of CPU : 72.2002
              Time of GPU : 4.40078

               

              I'm using a seperate context for each device, enqueuing work into both queues from a single host thread.

              Actually, it looks like the kernel is running on the GPU, but I'm getting back the same start & end times when querying with clGetEventProfilingInfo (making it appear like the kernel is taking no time to run). I've seen this before with SDK v2.5 whilst running on another ATI card, is this a bug you're aware of?

              Thanks.

                • Question about using APUs
                  genaganna

                   

                  Originally posted by: jprice $ ./SimpleMultiDevice ---------------------------------------------------------- CPU + GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 72 Time of CPU : 68.8067 Time of GPU : 4.40075 ---------------------------------------------------------- CPU + GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 72 Time of CPU : 71.3133 Time of GPU : 4.40083 ---------------------------------------------------------- CPU + GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 73 Time of CPU : 72.2002 Time of GPU : 4.40078

                   

                  I'm using a seperate context for each device, enqueuing work into both queues from a single host thread.

                  Actually, it looks like the kernel is running on the GPU, but I'm getting back the same start & end times when querying with clGetEventProfilingInfo (making it appear like the kernel is taking no time to run). I've seen this before with SDK v2.5 whilst running on another ATI card, is this a bug you're aware of?

                  Thanks.


                  Can you run SimpleMultiDevice sample with -e option?  SimpleMultiDevice also using clGetEventProfilingInfo. I am not aware of such issues. Looks like some thing going wrong in code.  Could you please paste your code here?

                    • Question about using APUs
                      jprice

                      I'd forgotten that this system has v2.3 installed alongside v2.5. The previous output for SimpleMultiDevice would have been using v2.3, my code also works fine with v2.3.

                       

                      Here's the output I get when I run against v2.5, using -e:

                      $ ./SimpleMultiDevice -e
                      ----------------------------------------------------------
                      CPU + GPU Test 1 : Single context Single Thread
                      ----------------------------------------------------------
                      Total time : 70
                      Time of CPU : 69.5631
                      Time of GPU : 0.000471
                      Verifying results for CPU : Passed!

                      Verifying results for GPU : Passed!

                      ----------------------------------------------------------
                      CPU + GPU Test 2 : Multiple context Single Thread
                      ----------------------------------------------------------
                      Total time : 70
                      Time of CPU : 68.5048
                      Time of GPU : 0.000217
                      Verifying results for CPU : Passed!

                      Verifying results for GPU : Passed!

                      ----------------------------------------------------------
                      CPU + GPU Test 3 : Multiple context Multiple Thread
                      ----------------------------------------------------------
                      Total time : 69
                      Time of CPU : 68.1591
                      Time of GPU : 0.000234
                      Verifying results for CPU : Passed!

                      Verifying results for GPU : Passed!



                      PASSED!

                      So it has the same problem as my code - always getting next to no difference from start to end times. If I remember rightly, upgrading the driver fixed this issue with v2.5, but that's not an option here since I don't have root access.

                      I'll just use v2.3 of the SDK for now.

                        • Question about using APUs
                          genaganna

                           

                          Originally posted by: jprice I'd forgotten that this system has v2.3 installed alongside v2.5. The previous output for SimpleMultiDevice would have been using v2.3, my code also works fine with v2.3.

                          So it has the same problem as my code - always getting next to no difference from start to end times. If I remember rightly, upgrading the driver fixed this issue with v2.5, but that's not an option here since I don't have root access.

                          I'll just use v2.3 of the SDK for now.

                          Could be a issue with profiling. Sample is passing. It is recommened to use latest driver with latest SDK.

                            • Question about using APUs
                              jprice

                              Yes it looks like my kernels are actually running fine, its just the profiler is returning incorrect stats.

                              Last time I tried the latest driver performance dropped by 10-15% because of the GPU_USE_SYNC_OBJECTS bug, but I'll try again on my own box soon to see if its fixed/worth it.

                              Thanks.