19 Replies Latest reply on Nov 18, 2014 7:55 AM by jtrudeau

    OpenCL™ 2.0 Preview - Shared Virtual Memory

    pinform

      Start using OpenCL™ 2.0 today – AMD is providing a sneak peek that works on GPUs and APUs.

       

      We are still working on the beta SDK which will be available soon.  In the meantime, we have example code ready for the adventurous among you, so you can start learning some of the ins and outs.

       

      We are creating a series of blog posts, called OpenCL 2.0 Demystified - One Feature at a Time.  The posts have insights, code snippets, and complete samples that you can download. We are making some serious example code available for you to study and play with.

       

      Review the first blog, on how shared virtual memory can make your code simpler and more efficient.  We have planned several posts highlighting various features of OpenCL 2.0.  So keep your eyes open – we’ll make announcements here to let you know when they are available.

       

      All these links are available directly in the blog.  Make sure you have supported hardware (there’s a complete list on the driver download page).  Play with the examples.  Maybe write your own OpenCL 2.0 samples.  And share your observations with the community here.  With this sneak peek at the example code, when the full release is available you’ll be ahead of the curve.

       

      Additional blogs in the series

        • Re: OpenCL™ 2.0 Preview
          coordz

          Thanks for the nice blog post with a clear intro to coarse SVM. I like the simplicity of the pointers being shared but am less sure about the performance implications in the example presented.

           

          Is there code for your OpenCL 1.2 timing comparisons? I'm not sure I quite believe them as on an APU I have zero copy so if I have my buffers set up correctly I can do a zero penalty map of those buffers in OpenCL 1.2 just fine. Then my data structure just needs to be tweaked slightly to be offset based rather than (true) pointer based and I think I'd get almost the same performance in OCL 1.2 as OCL 2.0.

            • Re: OpenCL™ 2.0 Preview
              dipak

              Thanks for the feedback. While we cannot give the source for OpenCL 1.2, you can always write that and test yourself easily. However, the main point is the fact that real penalty is in translation of data structures from pointers to indices, as can be seen in the table. Even if we optimize the  transfer time, that may not help much but for the translation time.


              Regards,

            • Re: OpenCL™ 2.0 Preview
              gabest

              I'm a little confused about SVM, how is it different than using an old style buffer with host memory pointer? In my experience changes to this kind of memory is just as visible between the CPU and GPU, without copying or even mapping the buffer(!), as the blog post describes SVM. Is it about the inner struct pointers being valid, too?

                • Re: OpenCL™ 2.0 Preview
                  coordz

                  My takeaways were:

                  a) The inner structure pointers are valid across devices.

                  b) Creating memory that can be mapped in a zero copy way is now simple and explicit rather than fiddly trying to get your buffer creation flags correct, etc.

                • Re: OpenCL™ 2.0 Preview
                  aiv

                  Hi,

                   

                  I have two questions:

                   

                  QUESTION1. My system is A10-7800. Is it supported by the OpenCL 2.0 ?

                   

                  Apparently after installation of the driver I see the following output of clinfo:

                   

                  Number of platforms:                       
                    Platform Profile:                         FULL_PROFILE
                    Platform Version:                         OpenCL 2.0 AMD-APP (1598.5)
                    Platform Name:                            AMD Accelerated Parallel Processing
                    Platform Vendor:                          Advanced Micro Devices, Inc.
                    Platform Extensions:                      cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
                    Platform Name:                            AMD Accelerated Parallel Processing
                  Number of devices:                          2                           
                    Device Type:                              CL_DEVICE_TYPE_GPU          

                  .....

                    Platform ID:                                   0x7f587e13d670                    
                    Name:                                          Spectre                           
                    Vendor:                                        Advanced Micro Devices, Inc.      
                    Device OpenCL C version:                       OpenCL C 2.0                      
                    Driver version:                                1598.5 (VM)

                  ....

                    Device Type:                                   CL_DEVICE_TYPE_CPU
                  ....

                    Platform ID:                              0x7f587e13d670  
                    Name:                                     AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G
                    Vendor:                                   AuthenticAMD                           
                  Device OpenCL C version:                  OpenCL C 1.2                           
                    Driver version:                           1598.5 (sse2,avx,fma4)                 

                   

                  I.e. "Device OpenCL C version" of CPU is still "OpenCL 1.2"

                   

                  QUESTION2. Is it normal that the execution time of my program nearly doubles regarding OpenCL 1.2 (fglrx-14.301.1001) ?

                   

                  My program is an example of multiple execution of parallel reduction on GPU.

                   

                  real0m45.524s
                  user0m0.957s
                  sys0m0.340s

                   

                  vs

                   

                  real0m25.686s
                  user0m0.860s
                  sys0m0.313s

                   

                  with fglrx-14.301.1001

                   

                  Thanks!

                    • Re: OpenCL™ 2.0 Preview
                      srp1970

                      Regarding your first question, when the clinfo says "OpenCL 2.0" then that device supports OpenCL 2.0. In this case, as you can see, the CPU is saying as "OpenCL 1.2" which means you cannot use CPU device for OpenCL 2.0 features.

                       

                      Regarding your second question, it is not normal that the same application is taking double the time on 2.0 driver. Can you send us more details (like host code, kernels and other configurations) ?

                       

                      Prakash

                        • Re: OpenCL™ 2.0 Preview
                          aiv

                          >> which means you cannot use CPU device for OpenCL 2.0 features.

                           

                          OK, I see. Please, can you explain me the difference between A10-7850K (which is said to be supported) and A10-7800 (which apparently isn't)? Do you plan to include support for A10-7800 at some latter moment or that device is fundamentally deprived of some circuitry that is essential for OpenCL 2.0 features? Why TDP of the 7850K is so much higher?


                          Ideally I need the lowest consumption OpenCL 2.0 APU. What would you recommend for that?

                           

                          >> Can you send us more details (like host code, kernels and other configurations) ?

                           

                          Sure, no problem with that. Where shall I send the code to?

                            • Re: OpenCL™ 2.0 Preview
                              dipak

                              You can directly attach the zip file here (using "use advance editor" option) or can provide a access link to the zip file uploaded at some other place. You can also send us the code via email to keep privacy.

                               

                              Regards,

                        • Re: OpenCL™ 2.0 Preview
                          pinform

                          Update: Our next blog in the OpenCL 2.0 demystified series is ready. It’s on pipes. While not exactly a piping hot feature, pipes serve many useful purposes.

                           

                          The blog explains how pipes in OpenCL 2.0 can make your code simpler and more readable. As usual, we have insights, code snippets, and complete samples that you can download. Go for it.

                          • Re: OpenCL™ 2.0 Preview
                            aiv

                            I'd keep it via e-mail. Which one shall I use?

                            • Re: OpenCL™ 2.0 Preview
                              aiv

                              Yes I did. I was travelling with a talk. Please see your mail - I have forwarded the code snippet to you. I'm interested in listening to your opinion.

                               

                              Best!

                              Alex

                              • Re: OpenCL™ 2.0 Preview - Shared Virtual Memory

                                Update:

                                Our next blog in the OpenCL 2.0 demystified series is ready. It’s on device-side enqueue, a powerful feature that promises to make your applications faster.

                                 

                                Our new blog explains how device-side enqueue and the new OpenCL 2.0 workgroup built-in functions can give wings to your applications. As usual, we have insights, code snippets, and complete samples that you can download.

                                 

                                The blog explains how device-side enqueue and the new OpenCL 2.0 workgroup built-in functions can provide non-trivial performance gains in your applications. As usual, we have insights, code snippets, and complete samples that you can download. Have fun!

                                 

                                All links are in the blog, but you can access everything from here if you prefer.