12 Replies Latest reply on Jul 12, 2012 3:40 AM by d.a.a.

    Alternate C++ wrapper for Platform/Runtime layers?

    craft_coder

      The more I learn of the standard wrapper, the more I find to dislike about it.

       

      Is there an alternate wrapper, perhaps written by someone who understands the RAII pattern?

        • Re: Alternate C++ wrapper for Platform/Runtime layers?
          LeeHowes

          I'm not sure I understand your RAII objection beyond the example you already gave that I just explained the reasons for. Can you give any other examples? Are you talking about what you feel is poor *internal* design (your Platform::get example being one instance that isn't poor design but rather designing to requirements) or poor interface design? There were some mistakes and inconsistencies in the interfaces, some of which we fixed recently, but others I know had to be kept for legacy reasons.

           

          If you explain your specific concerns then we can rectify them or add them as bugs where appropriate. That is apart from documentation, which I will chase up because that is obviously severely lacking.

          1 of 1 people found this helpful
            • Re: Alternate C++ wrapper for Platform/Runtime layers?
              craft_coder

              First, I apologize for my indelicate tone.  I didn't come here to stir up trouble or attack important & valuable contributions in this field.  I should also say that I'd much rather have the C++ API in its current form than not at all, and I'm grateful for AMD's role in bringing HPC technology to the masses and the decision to provide the SDK, tools, and documentation at no cost.

               

              I think my primary issue is that I've become accustomed to RAII, as I feel it's the most important idiom for effective use of C++, in a systems programming context.  I've had such success with it that I have difficulty imagining development of robust C++ software, at scale, without it.  That said, I understand you were under numerous constraints, and I can empathize with trying to navigate the delicate balance between numerous conflicting goals and limitations.

               

              One thing I can say in its favor is that I do like the attention paid to interoperability with the C API.  Though my experience with it is limited, I've also noted & appreciate the attention to minimizing or eliminating overhead.

               

               

              BTW, I recently purchased a copy of your book, Dr. Howes.  Although I'm familiar with the technology and could get by with just what I learned from a slide deck I found online + the spec, I hoped your book would provide tips, insights, and idioms supporting more effective use of OpenCL.  So far, I've found it worthwhile.

               

              In my own ways, I aspire to facilitate adoption of OpenCL, in both the open source & commercial domains.  At some point, I might even try my own hand at a C++ wrapper.  If/when I do, the existence of the Khronos wrapper might even embolden me to use even more ambitious techniques than I otherwise would have, through the knowledge that my effort needn't cater to all needs nor satisfy such an extensive set of constraints as yours.

                • Re: Alternate C++ wrapper for Platform/Runtime layers?
                  LeeHowes

                  No problem. I'm not a primary author of the C++ bindings, though I've been making improvements recently, I just understand the constraints they had when writing it. I don't entirely agree with all the design decisions either but that's hardly unexpected

                   

                  It would be interesting if you have a specific idea of where RAII would help in this case? In general I agree with you that RAII is important. In this case we do have that kind of behaviour for creation of buffers, queues etc. That's one of the reasons for using the C++ bindings in that they do retain/release of underlying resources cleanly. What more specifically were you interested in? Ben Gaster and I still have some changes we're working on to support OpenCL 1.2 features (there are compile and link functions in there now, for example) so we can always consider constructive feedback. Even in the current setup performance hits can be noticable when you copy objects around because the retain/release behaviour ends up taking so many temporary locks internally as it enters and leaves the runtime.

                   

                  I hope the book is useful to you! I've been pleasantly surprised by the feedback we've received on it and I'd like to update the hardware chapters for GCN and HSA at some point.

                    • Re: Alternate C++ wrapper for Platform/Runtime layers?
                      craft_coder

                      LeeHowes wrote:

                       

                      ... In general I agree with you that RAII is important. In this case we do have that kind of behaviour for creation of buffers, queues etc. That's one of the reasons for using the C++ bindings in that they do retain/release of underlying resources cleanly.

                      Actually, cl::CommandQueue is one of the cases that triggered my initial post.  I looked at the sources and saw no destructor and only a trivial copy constructor that appeared to simply copy the ID.  Neither the header included with the APP SDK nor the Khronos 1.1 C++ API doc say anything about copy or destruction semantics.

                       

                      Until now, I hadn't noticed that Wrapper<> actually calls ReferenceHandler<>::retain()/release().  I hadn't thought to look there and the C++ API doc doesn't mention anything about ReferenceHandler<>.

                       

                      LeeHowes wrote:

                       

                      ... Even in the current setup performance hits can be noticable when you copy objects around because the retain/release behaviour ends up taking so many temporary locks internally as it enters and leaves the runtime.

                      Aren't these simple atomic counters?  It seems like the only time you should need a lock is when the refcount reaches zero.  boost::shared_ptr<> is a common example of this.

                       

                      LeeHowes wrote:

                       

                      I hope the book is useful to you! I've been pleasantly surprised by the feedback we've received on it and I'd like to update the hardware chapters for GCN and HSA at some point.

                      Feel free to PM me, if you need any reviewers for these sections!

                       

                        • Re: Alternate C++ wrapper for Platform/Runtime layers?
                          LeeHowes

                          Ah yes, it isn't entirely obvious that there's cleverness in the background for construction and destruction. They abstracted that for reuse. Again, noted about the doc. Maybe we'll be able to track down an intern with a penchant for Doxygen

                            • Re: Alternate C++ wrapper for Platform/Runtime layers?
                              craft_coder

                              It's best if the original author does the doc, since only that person knows the full intention behind a given construct.  The author can also specify at the appropriate level of generality, whereas a 3rd party can do little more than describe the current implementation.  I never define a public function or datatype without documenting it.   I'd estimate that adds only about 10% overhead, as opposed to 15% - 20% in the case of going back and documenting something after-the-fact.

                               

                              Since I'm guessing it's not an option for the original author(s) to do it, would you consider accepting doc patches from community members, such as myself?  I've used doxygen in every header file I've written over the course of the past 9 years.

                               

                              For each function, I typically provide:

                                * a one-line brief description

                                * a detailed description of up to a couple paragraphs (depending on whether it's warranted)

                                * @returns, describing the return value

                                * @throws for each exception thrown, and the conditions under which it can occur

                                * a brief description for each function & template parameter.

                                * @note and @warning blocks, in order to highlight important usage details and hazards.

                                * if usage is unclear, I often provide @code samples at class or file-scope.

                                * @groups, as necessary

                               

                              I can submit patches for a couple classes at a time.  The preprocessed file can probably be diff'd in order to verify that no non-comment changes have been made.  Perhaps code formatter, such as AStyle, can be used between the preprocessor and diff, in order to filter out any newline changes.  aspell can be used on the patches, themselves, in order to catch any spelling errors.

                               

                              What I'd hope get out of it is an opportunity to take a good, close read through the source.

                      • Re: Alternate C++ wrapper for Platform/Runtime layers?
                        d.a.a.

                        Due to the OpenCL C++ wrapper API I could write very clean and elegant OpenCL

                        programs for teaching purposes. Taking this opportunity, nonetheless, I would

                        like to suggest two minor improvements.

                         

                        Firstly, the method cl::Program::build builds for all devices associated with

                        program when the vector argument 'devices' is null. So, for instance,

                         

                           my_program.build( std::vector<cl::Device>() );

                         

                        would build for all devices associated with 'my_program'. However, in those

                        cases, it would be easier to just type

                         

                           my_program.build();

                         

                        but unfortunately this version isn't recognized. Thus, my first suggestion is

                        to add that syntax to the C++ API.

                         

                        The second suggestion is to add an overloaded cl::Context::Context that also

                        accepts a single device as argument (instead of requiring a vector of devices).

                        For example,

                         

                           cl::Context my_context( my_device );

                         

                        where 'my_device' is of type 'cl::Device'. That would slightly improve the

                        readability when the user just wants to use a single device.

                         

                        Finally, what is the appropriated place/channel to suggest contributions to

                        the documentation of the C++ Bindings Specification?

                         

                         

                        Thank you.

                          • Re: Alternate C++ wrapper for Platform/Runtime layers?
                            LeeHowes

                            Those suggestions make sense, I'll take a look as soon as my TODO list isn't too packed. We made a point of adding support for defaults in the latest version which solves some of those problems, too. Defaults and functors mean the shortest possible OpenCL program is now pretty short (10 lines or so, I think).

                             

                            As for documentation I'm checking if there's an official channel. You're welcome to post feedback here or e-mail me at firstname.lastname at amd.com or send me a PM through the forums. I want to go through heavily doxygenning the C++ headers, but it's a lot of work and so will take a while.

                            • Re: Alternate C++ wrapper for Platform/Runtime layers?
                              LeeHowes

                              If you want to report bugs/suggestions you are welcome to go to bugzilla at Khronos if it's something important.

                               

                              https://www.khronos.org/bugzilla/

                               

                              I think it's public registration. Obviously don't overdo it. You are always welcome to post here as well as we maintain it.

                               

                              I looked at your query about build. There is:

                              cl_int build( const char* options = NULL, void (CL_CALLBACK *notifyFptr)(cl_program, void*) = NULL, void *data = NULL );

                               

                              Which calls ::clBuildProgram with a null device list. That should do exactly what you want shouldn't it? The arguments are all optional... so calling build() should be fine.

                                • Re: Alternate C++ wrapper for Platform/Runtime layers?
                                  craft_coder

                                  What about extensions to the C API?

                                   

                                  I'd really like a way to cancel events.  From an implementation perspective, this would only be advisory.  It's something users (e.g. of interactive programs) could use, once the results of some in-flight computation have been invalidated.  The benefit would be to free up device resources ASAP, so that processing of new data could start sooner.

                                   

                                  It would also be useful to have a function like clWaitForEvents() that waits until any of the specified events completes (or a timeout expires).  I'm thinking of something along the lines of select() or epoll().  This is useful for 3 reasons:

                                  1. It enables the waiting thread to be interrupted by a user event, possibly signalling the arrival of new data to be processed.
                                  2. It allows multiple kernels to be multiplexed by a smaller number of host threads.  This could reduce context switching overhead on the host & possibly improve the host CPU cache hit rate if the host CPU is communicating data between said kernels.
                                  3. It allows time-based events to be multiplexed with processing.  This could be useful for things like snapshotting, initiating some periodic computations or updates, and integration with other components of a system that may rely on polling (e.g. GUI refreshes).
                                  • Re: Alternate C++ wrapper for Platform/Runtime layers?
                                    d.a.a.

                                    Hi LeeHowes,

                                     

                                    Sorry, I didn't see your reply (I though I would receive a reply notification via e-mail).

                                     

                                    LeeHowes wrote:

                                     

                                    I looked at your query about build. There is:

                                    cl_int build( const char* options = NULL, void (CL_CALLBACK *notifyFptr)(cl_program, void*) = NULL, void *data = NULL );

                                     

                                    Which calls ::clBuildProgram with a null device list. That should do exactly what you want shouldn't it? The arguments are all optional... so calling build() should be fine.

                                     

                                    This is exactly what I want, thank you for pointing this out. I had an older cl.hpp header that didn't have this overloaded "build()".

                                     

                                    Best regards.