13 Replies Latest reply on Jul 24, 2013 12:07 PM by himanshu.gautam

    OpenCL 2.0 spec out !






      Main new features:

      • Shared Virtual Memory
        Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.
      • Dynamic Parallelism
        Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.
      • Generic Address Space
        Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.
      • Images
        Improved image support including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.
      • C11 Atomics
        A subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across work-groups executing on a device or for sharing data between the OpenCL device and host.
      • Pipes
        Pipes are memory objects that store data organized as a FIFO and OpenCL 2.0 provides built-in functions for kernels to read from or write to a pipe, providing straightforward programming of pipe data structures that can be highly optimized by OpenCL implementers.
      • Android Installable Client Driver Extension
        Enables OpenCL implementations to be discovered and loaded as a shared object on Android systems.


      And now let me tell again you that SPIR is the most critical feature you should implement. The 90% of the enterprises I know aren't using OpenCL because they don't want to distribute their kernel source with the app

        • Re: OpenCL 2.0 spec out !

          Looks like a lot of awesome features

          Also SPIR 1.2 spec is released for review. Great day for OpenCL people today.

          • Re: OpenCL 2.0 spec out !

            Wonder if is there any current AMD GPU which can support OpenCL 2.0 in HW as many people coments that Dynamic Parallelism is not possible on current HW.

              • Re: OpenCL 2.0 spec out !

                Yes i'm kind of sceptical on that too, I didn't see recursion for non-kernel functions.


                That doesn't follow CUDA roadmap. Recursion for non-kernel functions is available since CUDA 3.0 and Fermi chips dynamic parallelism was introduce in CUDA 5.0  and Kepler 3.5 . I though Dynamic parallelism would be harder to implement so they would begin with classic recursion.


                However pretty good stuff in those specs, love pipes, pointer containing structs...



                • Re: OpenCL 2.0 spec out !

                  Dynamic parallelism and recursion are not the same thing. One is a capability of some combination of the compiler and the ISA (in terms of calling conventions, stacks and true function call/return behaviour) while the other is a capability of the queuing infrastructure.

                    • Re: OpenCL 2.0 spec out !

                      ok, I read the specs more closely and I understand what you mean. I had in mind the CUDA Dynamic Parallelism which offer kernel function recursion, a parent kernel can block and wait the execution of his child , then reseting his execution when his child's finished. It seems it's not that extended in OpenCL 2.0.



                        • Re: OpenCL 2.0 spec out !

                          Correct. It is a slightly lower level model where the user would have to split the kernel manually into the primary kernel and the post-enqueue continuation which could then wait on the child because completion events nest. This is a slightly more generic approach given the range of architectures involved. A higher level compiler could perform this transformation for you, that is just not specified as part of OpenCL 2.

                    • Re: OpenCL 2.0 spec out !

                      Why don't you distribute your code in binary or in amd_il format (in an .elf)? When it comes to reverse engineering even amd_il format looks like full garbage just as the disassembled ISA. Both need lot of effort if someone wants to understand your program.

                      • Re: OpenCL 2.0 spec out !

                        Maybe the AMD radeon 7000 series resist the OpenCL 2.0?

                        • Re: OpenCL 2.0 spec out !

                          I'm glad you like the changes we've made in the spec, it's been a lot of work to get it this far.


                          Don't forget subgroups in your list (as a KHR at the moment). People have long asked for wavefront parameters to be visible to the programmer. Subgroups should provide that in an abstract way. Remember also that a wavefront is a thread so you can think about what that means for what you can do safely in the execution model.


                          Please do give feedback on the Khronos forums and bugzilla. We will be making small modifications to the final release spec in a few months time based on feedback we receive on this public review draft and we hope that will lead to a much stronger final specification.

                          • Re: OpenCL 2.0 spec out !



                            can you explain a bit more about this? What's Generic Adress Space? To use void* and other pointers without having to specify __global/__local/__private? Why is this useful and applications, pls?


                            and now we can fire kernels from kernels... but .... are executed thread-by-thread or...? If you fire a kernel form each threadID that gonna kill SIMD grouping... I don't understand how this is executed?


                            And I think the spec is lacking two important feature we were demanding: a flag to tell OpenCL to disable the Watchdog for long-time operations and a way to use different size images in an array.

                              • Re: OpenCL 2.0 spec out !

                                There are many use cases, but the obvious one is where you want to make a generic function to use for multiple address spaces:


                                int foo(int *a);


                                kernel void bar(local int *a, global int *b)







                                Launching kernels from kernels is like launching from the host - it's an asynchronous dispatch. The launching work items don't block. You get completely new threads once the current ones have completed (or concurrently if you happen to have cores available and the launchers are long running). There's no relationship between the original kernel and the launched kernel.


                                Disabling the watchdog on a per-launch basis is may not be practical if it is an OS timer rather than something the driver has control of. I can't really comment on why some features got in and others didn't. There is obviously ongoing negotiation on priorities given the needs of different vendors, their customers and the capabilities of the planned devices.

                              • Re: OpenCL 2.0 spec out !

                                The spec is provisional. FYI.


                                Note: "clCreatCommandQueue" is deprecated. We need to go with "clCreateCommandQueueWithProperties". Bit sad to see such a common, basic API getting deprecated.

                                Other deprecated APIs include clCreateSampler and clEnqueueTask.


                                "Shared Virtual Memory" capability was almost there in AMD cards. The "VM" string in the OpenCl driver suggests the ability to map OpenCL kernel pointers into Host RAM. But now -- it is more formally exposed.

                                Also, Will help HSA Runtimes and HSA aware OpenCL applications


                                - Bruhaspati