cancel
Showing results for 
Search instead for 
Did you mean: 

OpenCL

Highlighted
Staff
Staff

OpenCL™ 2.0 Preview - Shared Virtual Memory

Start using OpenCL™ 2.0 today – AMD is providing a sneak peek that works on GPUs and APUs.

We are still working on the beta SDK which will be available soon.  In the meantime, we have example code ready for the adventurous among you, so you can start learning some of the ins and outs.

We are creating a series of blog posts, called OpenCL 2.0 Demystified - One Feature at a Time.  The posts have insights, code snippets, and complete samples that you can download. We are making some serious example code available for you to study and play with.

Review the first blog, on how shared virtual memory can make your code simpler and more efficient.  We have planned several posts highlighting various features of OpenCL 2.0.  So keep your eyes open – we’ll make announcements here to let you know when they are available.

All these links are available directly in the blog.  Make sure you have supported hardware (there’s a complete list on the driver download page).  Play with the examples.  Maybe write your own OpenCL 2.0 samples.  And share your observations with the community here.  With this sneak peek at the example code, when the full release is available you’ll be ahead of the curve.

Additional blogs in the series

0 Kudos
Reply
19 Replies
Highlighted
Adept II
Adept II

Re: OpenCL™ 2.0 Preview

Thanks for the nice blog post with a clear intro to coarse SVM. I like the simplicity of the pointers being shared but am less sure about the performance implications in the example presented.

Is there code for your OpenCL 1.2 timing comparisons? I'm not sure I quite believe them as on an APU I have zero copy so if I have my buffers set up correctly I can do a zero penalty map of those buffers in OpenCL 1.2 just fine. Then my data structure just needs to be tweaked slightly to be offset based rather than (true) pointer based and I think I'd get almost the same performance in OCL 1.2 as OCL 2.0.

0 Kudos
Reply
Highlighted
Journeyman III
Journeyman III

Re: OpenCL™ 2.0 Preview

I'm a little confused about SVM, how is it different than using an old style buffer with host memory pointer? In my experience changes to this kind of memory is just as visible between the CPU and GPU, without copying or even mapping the buffer(!), as the blog post describes SVM. Is it about the inner struct pointers being valid, too?

0 Kudos
Reply
Highlighted
Staff
Staff

Re: OpenCL™ 2.0 Preview

Thanks for the feedback. While we cannot give the source for OpenCL 1.2, you can always write that and test yourself easily. However, the main point is the fact that real penalty is in translation of data structures from pointers to indices, as can be seen in the table. Even if we optimize the  transfer time, that may not help much but for the translation time.


Regards,

0 Kudos
Reply
Highlighted
Adept II
Adept II

Re: OpenCL™ 2.0 Preview

Perhaps in the table you could explicitly label what the OpenCL 1.2 timings actually are, i.e. kernel execution time, data structure translation time, transfer times. As they stand it's just implied by the text.

0 Kudos
Reply
Highlighted
Adept II
Adept II

Re: OpenCL™ 2.0 Preview

My takeaways were:

a) The inner structure pointers are valid across devices.

b) Creating memory that can be mapped in a zero copy way is now simple and explicit rather than fiddly trying to get your buffer creation flags correct, etc.

0 Kudos
Reply
Highlighted
Journeyman III
Journeyman III

Re: OpenCL™ 2.0 Preview

Hi,

I have two questions:

QUESTION1. My system is A10-7800. Is it supported by the OpenCL 2.0 ?

Apparently after installation of the driver I see the following output of clinfo:

Number of platforms:                       
  Platform Profile:                         FULL_PROFILE
  Platform Version:                         OpenCL 2.0 AMD-APP (1598.5)
  Platform Name:                            AMD Accelerated Parallel Processing
  Platform Vendor:                          Advanced Micro Devices, Inc.
  Platform Extensions:                      cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Name:                            AMD Accelerated Parallel Processing
Number of devices:                          2                           
  Device Type:                              CL_DEVICE_TYPE_GPU          

.....

  Platform ID:                                   0x7f587e13d670                    
  Name:                                          Spectre                           
  Vendor:                                        Advanced Micro Devices, Inc.      
  Device OpenCL C version:                       OpenCL C 2.0                      
  Driver version:                                1598.5 (VM)

....

  Device Type:                                   CL_DEVICE_TYPE_CPU
....

  Platform ID:                              0x7f587e13d670  
  Name:                                     AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G
  Vendor:                                   AuthenticAMD                           
Device OpenCL C version:                  OpenCL C 1.2                           
  Driver version:                           1598.5 (sse2,avx,fma4)                 

I.e. "Device OpenCL C version" of CPU is still "OpenCL 1.2"

QUESTION2. Is it normal that the execution time of my program nearly doubles regarding OpenCL 1.2 (fglrx-14.301.1001) ?

My program is an example of multiple execution of parallel reduction on GPU.

real0m45.524s
user0m0.957s
sys0m0.340s

vs

real0m25.686s
user0m0.860s
sys0m0.313s

with fglrx-14.301.1001

Thanks!

0 Kudos
Reply
Highlighted
Adept I
Adept I

Re: OpenCL™ 2.0 Preview

Regarding your first question, when the clinfo says "OpenCL 2.0" then that device supports OpenCL 2.0. In this case, as you can see, the CPU is saying as "OpenCL 1.2" which means you cannot use CPU device for OpenCL 2.0 features.

Regarding your second question, it is not normal that the same application is taking double the time on 2.0 driver. Can you send us more details (like host code, kernels and other configurations) ?

Prakash

0 Kudos
Reply
Highlighted
Journeyman III
Journeyman III

Re: OpenCL™ 2.0 Preview

>> which means you cannot use CPU device for OpenCL 2.0 features.

OK, I see. Please, can you explain me the difference between A10-7850K (which is said to be supported) and A10-7800 (which apparently isn't)? Do you plan to include support for A10-7800 at some latter moment or that device is fundamentally deprived of some circuitry that is essential for OpenCL 2.0 features? Why TDP of the 7850K is so much higher?


Ideally I need the lowest consumption OpenCL 2.0 APU. What would you recommend for that?

>> Can you send us more details (like host code, kernels and other configurations) ?

Sure, no problem with that. Where shall I send the code to?

0 Kudos
Reply
Highlighted
Staff
Staff

Re: OpenCL™ 2.0 Preview

Update: Our next blog in the OpenCL 2.0 demystified series is ready. It’s on pipes. While not exactly a piping hot feature, pipes serve many useful purposes.

The blog explains how pipes in OpenCL 2.0 can make your code simpler and more readable. As usual, we have insights, code snippets, and complete samples that you can download. Go for it.

0 Kudos
Reply