cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

pinform
Staff

OpenCL™ 2.0 Preview - Shared Virtual Memory

Start using OpenCL™ 2.0 today – AMD is providing a sneak peek that works on GPUs and APUs.

We are still working on the beta SDK which will be available soon.  In the meantime, we have example code ready for the adventurous among you, so you can start learning some of the ins and outs.

We are creating a series of blog posts, called OpenCL 2.0 Demystified - One Feature at a Time.  The posts have insights, code snippets, and complete samples that you can download. We are making some serious example code available for you to study and play with.

Review the first blog, on how shared virtual memory can make your code simpler and more efficient.  We have planned several posts highlighting various features of OpenCL 2.0.  So keep your eyes open – we’ll make announcements here to let you know when they are available.

All these links are available directly in the blog.  Make sure you have supported hardware (there’s a complete list on the driver download page).  Play with the examples.  Maybe write your own OpenCL 2.0 samples.  And share your observations with the community here.  With this sneak peek at the example code, when the full release is available you’ll be ahead of the curve.

Additional blogs in the series

0 Likes
19 Replies
coordz
Adept II

Thanks for the nice blog post with a clear intro to coarse SVM. I like the simplicity of the pointers being shared but am less sure about the performance implications in the example presented.

Is there code for your OpenCL 1.2 timing comparisons? I'm not sure I quite believe them as on an APU I have zero copy so if I have my buffers set up correctly I can do a zero penalty map of those buffers in OpenCL 1.2 just fine. Then my data structure just needs to be tweaked slightly to be offset based rather than (true) pointer based and I think I'd get almost the same performance in OCL 1.2 as OCL 2.0.

0 Likes

Thanks for the feedback. While we cannot give the source for OpenCL 1.2, you can always write that and test yourself easily. However, the main point is the fact that real penalty is in translation of data structures from pointers to indices, as can be seen in the table. Even if we optimize the  transfer time, that may not help much but for the translation time.


Regards,

0 Likes

Perhaps in the table you could explicitly label what the OpenCL 1.2 timings actually are, i.e. kernel execution time, data structure translation time, transfer times. As they stand it's just implied by the text.

0 Likes
gabest
Journeyman III

I'm a little confused about SVM, how is it different than using an old style buffer with host memory pointer? In my experience changes to this kind of memory is just as visible between the CPU and GPU, without copying or even mapping the buffer(!), as the blog post describes SVM. Is it about the inner struct pointers being valid, too?

0 Likes

My takeaways were:

a) The inner structure pointers are valid across devices.

b) Creating memory that can be mapped in a zero copy way is now simple and explicit rather than fiddly trying to get your buffer creation flags correct, etc.

0 Likes
aiv
Journeyman III

Hi,

I have two questions:

QUESTION1. My system is A10-7800. Is it supported by the OpenCL 2.0 ?

Apparently after installation of the driver I see the following output of clinfo:

Number of platforms:                       
  Platform Profile:                         FULL_PROFILE
  Platform Version:                         OpenCL 2.0 AMD-APP (1598.5)
  Platform Name:                            AMD Accelerated Parallel Processing
  Platform Vendor:                          Advanced Micro Devices, Inc.
  Platform Extensions:                      cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
  Platform Name:                            AMD Accelerated Parallel Processing
Number of devices:                          2                           
  Device Type:                              CL_DEVICE_TYPE_GPU          

.....

  Platform ID:                                   0x7f587e13d670                    
  Name:                                          Spectre                           
  Vendor:                                        Advanced Micro Devices, Inc.      
  Device OpenCL C version:                       OpenCL C 2.0                      
  Driver version:                                1598.5 (VM)

....

  Device Type:                                   CL_DEVICE_TYPE_CPU
....

  Platform ID:                              0x7f587e13d670  
  Name:                                     AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G
  Vendor:                                   AuthenticAMD                           
Device OpenCL C version:                  OpenCL C 1.2                           
  Driver version:                           1598.5 (sse2,avx,fma4)                 

I.e. "Device OpenCL C version" of CPU is still "OpenCL 1.2"

QUESTION2. Is it normal that the execution time of my program nearly doubles regarding OpenCL 1.2 (fglrx-14.301.1001) ?

My program is an example of multiple execution of parallel reduction on GPU.

real0m45.524s
user0m0.957s
sys0m0.340s

vs

real0m25.686s
user0m0.860s
sys0m0.313s

with fglrx-14.301.1001

Thanks!

0 Likes

Regarding your first question, when the clinfo says "OpenCL 2.0" then that device supports OpenCL 2.0. In this case, as you can see, the CPU is saying as "OpenCL 1.2" which means you cannot use CPU device for OpenCL 2.0 features.

Regarding your second question, it is not normal that the same application is taking double the time on 2.0 driver. Can you send us more details (like host code, kernels and other configurations) ?

Prakash

0 Likes

>> which means you cannot use CPU device for OpenCL 2.0 features.

OK, I see. Please, can you explain me the difference between A10-7850K (which is said to be supported) and A10-7800 (which apparently isn't)? Do you plan to include support for A10-7800 at some latter moment or that device is fundamentally deprived of some circuitry that is essential for OpenCL 2.0 features? Why TDP of the 7850K is so much higher?


Ideally I need the lowest consumption OpenCL 2.0 APU. What would you recommend for that?

>> Can you send us more details (like host code, kernels and other configurations) ?

Sure, no problem with that. Where shall I send the code to?

0 Likes

You can directly attach the zip file here (using "use advance editor" option) or can provide a access link to the zip file uploaded at some other place. You can also send us the code via email to keep privacy.

Regards,

0 Likes
pinform
Staff

Update: Our next blog in the OpenCL 2.0 demystified series is ready. It’s on pipes. While not exactly a piping hot feature, pipes serve many useful purposes.

The blog explains how pipes in OpenCL 2.0 can make your code simpler and more readable. As usual, we have insights, code snippets, and complete samples that you can download. Go for it.

0 Likes
aiv
Journeyman III

I'd keep it via e-mail. Which one shall I use?

0 Likes

I've sent a message to you. Please send your code to the email id mentioned in that message .

0 Likes

Did you get my message? If not, please let me know.

0 Likes
aiv
Journeyman III

Yes I did. I was travelling with a talk. Please see your mail - I have forwarded the code snippet to you. I'm interested in listening to your opinion.

Best!

Alex

0 Likes

Yes. I got the code. We'll check it and get back to you shortly.

Regards,

0 Likes

Hi,

I've a question.

Did you compile the program for 1.2 and 2.0 using the same 14.41 driver? or Did you use different driver for 1.2? If so, please let us know the driver version.

Regards,

0 Likes
aiv
Journeyman III

Hi,

I was reinstalling the driver.

For OCL1.2 I was using amd-catalyst-14-9-linux-x86-x86-64.zip (fglrx-14.301.1001),

For OCL2.0 I was using linux-amd-14.41rc1-opencl2-sep19.zip (fglrx-14.41).

I tried to do complete removal and re-installation each time. I have repeated it two times with the same results.

Best!

Alex

0 Likes

Please compile the program for both 1.2 and 2.0 (pass build option accordingly) using 14.41 and share your observation. Meanwhile we'll try at our end.

Regards,

0 Likes
jtrudeau
Staff

Update:

Our next blog in the OpenCL 2.0 demystified series is ready. It’s on device-side enqueue, a powerful feature that promises to make your applications faster.

Our new blog explains how device-side enqueue and the new OpenCL 2.0 workgroup built-in functions can give wings to your applications. As usual, we have insights, code snippets, and complete samples that you can download.

The blog explains how device-side enqueue and the new OpenCL 2.0 workgroup built-in functions can provide non-trivial performance gains in your applications. As usual, we have insights, code snippets, and complete samples that you can download. Have fun!

All links are in the blog, but you can access everything from here if you prefer.

0 Likes