I would like to open this topic as a placeholder for everyone to make suggestions about the 'standard' C++ wrapper of OpenCL, cl.hpp. Let me start off by a little brainstorming, suggestions and questions about the wrapper:
- The C++ wrapper is an extremely useful initiative. It is high time that an accepted standard way of wrapping OpenCL sees light (I know it has been around for quite some time, but great improvement has been made in the 2.8 version). I think it is worth placing more effort into it, as it greatly facilitates OpenCL coding.
- On some parts I don't understand the motives behind the structure. I really fancy the idea of interfacing cl::Buffers with STL (compatible) containers, but assuming default constructed contexts in these functions sort of shoots the idea in the leg. It seems as if these functions would want to provide a limited subset of functionality that looks as simple as C++AMP.
- C++AMP is as neat as a GPU-capable parallel API can look like. (And the best part is that by integrating into the language, it ceases being an API) C++AMP interfaces strongly with STL which makes it even easier to use. I believe many people would welcome some similar feel to the wrapper, but in my opinion losing functionality on the way is not the solution. I guess the reason that context as a parameter in the STL constructor of cl::Buffer is ommited so that people shouldn't have to write cl::Context::getDefault() everytime, but this results in making assumptions which is rarely good in a wrapper. There should be two functions, one that does not need a context and one that has it as parameter and leave it up to the user.
- I have stumbled upon a strange problem, namely that cl::EnqeueArgs is not copy-constructible. Writing std::vector<cl::EnqueueArgs> m_args(2); results in a compiler error, saying "error C2582: 'operator =' function is unavailable in 'cl::EnqueueArgs' ". This is true, cl::EnqueueArgs is a struct and I'm no C++ expert, but I would expect the compiler to generate an assign operator for it. The reason it might not be able to do this is because one of it's member-types, cl::NDRange does not have a copy-constructor, neither does it have an assign operator. One would think that if the compiler argues about no = operator, then this type cannot be put into an std::vector at all, because it will definately will not be able to reallocate itself without being able to copy-construct it's elements, but that is not true. VS2012 only cries when I try to default construct it's elements when declaring the vector itself. If I don't write (2) after the variable name, it compiles and works fine. It might be that I am underskilled in using C++, but this behavior is definately unintuitive.
- Introducing functors for a kernel call is a really neat thing, specially that it expects EnqueueArgs which holds the commandqueue it will place it upon. I would like to ask others if they think it would be a good idea to have something similar for all commandqueue operations. If cl::CommandQueue could be used similarily as an std::queue, with an << operator and it would accept OpenCL memory operations also (similarily as functors perhaps) but also any callable objects, such as std::functions. Right now this is only achieveable by either creating custom made queues, or by enqueueing markers associated with user-defined events and setting a callback function. This seems rather complicated. It would really be neat, if the cl::CommandQueue wrapper could take care of both OpenCL and standard host-side functions with it's powerful sync capabilities through cl::Events. All of this is host-side magic and is not too complicated to be incorporated into the wrapper. If you think this would unneccesarily increase the complexity of the wrapper which is intented to be lightweight, then I accpet that, but I believe this would be a feature to consider.
- The new C++ standard utilizes std::future and std::promise as a means of querying and acquiring the results of async commands. In OpenCL this is all done through cl::Event, both the availability of the result, and it's actual value. I have no idea how one would go about seperating this duality of cl::Event, but it might prove to be more easy to understand in the long run, if it looks more as something that will extensively be used by programmers who write async C++ code. (Assuming we all agree on that the new C++ standards of thread management, atomics, synchronizing, PRNGs, and all that goodness is just jolly good, and finally made standard) This only came to my mind, because C++AMP uses std::futures to provide the availability of async commands, and does not introduce a new entitiy (which I find elegant). I know that cl::Event has to exist, but perhaps cl::future and cl::promise might provide an alternative approach for those more familiar with the C++ way, and both could reference the same cl_event under the hood. (They could inherit from std::types to maintain compatibility with STL, or something)
- Just out of curiosity, how much C++11 is inside the AMD implementation of OpenCL? I would imagine that everything has already been written without it, so there is no great need to shift codebase to the new standard, but being a person regularily writing cross-platform code, it eases mainting code greatly if I change my stuff to standard way of handling threads (without the need of 3rd party libs). I understand that such a shift of codebase by AMD is only an option once the C++ runtimes are stable enough across all platforms, and as we know, sadly MSVC is far behind implementing C++11 (the 2012 CTP Nov compiler patches up a lot of things, but still not all). I hope it is not industrial secret to tell whether there are plans of standardizing code.
If anyone else has ideas, comments, feel free to share it. And naturally, we are eager to recieve corporate feedback.