10 Replies Latest reply on Mar 18, 2011 2:38 PM by Jawed

    Wrapping OpenCL host code in C++ class problem

    atlemann

      Does all the OpenCL host code need to be within the same scope? I am having trouble having my host code in a C++ class, which initializes all buffers and bulids kernels in the constructor and has all setting of kernel arguments and enquqeing inside different functions for each kernel.

      I have an instance of this class outside a loop and I call the functions which again enqueues the kernels from inside the loop. I do this to avoid duplication and hide all the ugly host code from the client.

      However, when I do this I do not get the correct answer anymore and I also get totaly different answers on a 9800GT (close to the correct answer each time) and a GTX460 (random totally wrong answers each time) (sorry for using nVidia..I ask here since they don't seem so dedicated to OpenCL. And their forum is down...).

      Before I had all the host code inside one single function which works and gives the same answer on both cards. But I want to clean it up a bit. I also use QtOpenCL, but that should not matter since it is just a fancy wrapper.

        • Wrapping OpenCL host code in C++ class problem
          Jawed

          See A.2 in the OpenCL specification, version 1.1.

          You may have multiple objects, each of which sets their own kernel arguments. If all the objects happen to be sharing the same kernel object and each object sets arguments and enqueues independently, then each object will overwrite the arguments set for the kernel and you will get chaos. That's because the kernel object (in the OpenCL runtime) is common to all of your objects that are setting arguments. That object is not thread-safe.

          To get round this, simply make each of your own objects create a private kernel object with clCreateKernel. Each of your objects will create the kernel object from the same program object and kernel name string. But they will now act independently.

          Notes 71 and 72 on page 363 might be preferable to my explanation!

          Another way around this problem is that the method that sets arguments also enqueues execution of the kernel.

          • Wrapping OpenCL host code in C++ class problem
            himanshu.gautam

            atleman,

            Please post some code and your System Details(CPU,GPU,SDK,Catalyst,OS).

            Most of the SDK samples are written using classes. That might be helpful to you.

            Thanks

              • Feb2010
                atlemann

                I am using nVidia cards atm. 9800gt and gtx460, but this forum is better i think. Running on CentOS 5 and OpenCL 1.0. I am also using QtOpenCL. Everything is run in the same thread, but I want to run on multiple GPUs as well. Guess I have to use different threads. (I all works if I have all host code inside "runSimulation" in one huge function)

                 

                Here is some code:

                 

                 

                runSimulation() { AbsPermHost absPermHost(absPermVariables_, grid_, device_); Timer timer; // Main loop while (...) { for (0...100) { QCLEvent collideEvent = absPermHost.collideAndSwap(); collideEvent.waitForFinished(); QCLEvent streamEvent = absPermHost.streamBySwapping(); streamEvent.waitForFinished(); ++nIterations; } (...check convergence...) } // End main loop (...print results...) } class AbsPermHost { public: /** * Constructor */ AbsPermHost(const AbsPermVariables& absPermVariables, const Grid& grid, QCLDevice& device); QCLEvent collideAndSwap(); QCLEvent streamBySwapping(); (...) private: void createKernels(); (...) // OpenCL objects QCLContext context_; QCLDevice device_; QCLKernel collideAndSwapKernel_; QCLKernel streamBySwappingKernel_; QCLKernel computeAverageVelocitiesKernel_; // Work sizes QCLWorkSize simulationSize_; QCLWorkSize nCollisionThreads_; QCLWorkSize localSizeCollision_; QCLWorkSize localSizeStream_; unsigned int localSizeXCompAverage_; // Kernel data Lattices latticesCl_; QCLBuffer bodyForcesCl_; QCLVector<int> obstaclesCl_; // Shared memory sizes unsigned int sharedMemoryByteSizeCollision_; unsigned int sharedMemoryByteSizeCompAvg_; }; AbsPermHost::AbsPermHost(const AbsPermVariables& absPermVariables, const Grid& grid, QCLDevice& device) : absPermVariables_(absPermVariables), nInlets_(AbsPermCL::computeInlets(grid)), nPoreCells_(grid.getNPoreCells()), device_(device) { // Create a context QList<QCLDevice> devices; devices.push_back(device_); context_.create(devices); createKernels() (...init buffers and stuff...) } QCLEvent AbsPermHost::collideAndSwap() { collideAndSwapKernel_.setGlobalWorkSize(nCollisionThreads_); collideAndSwapKernel_.setLocalWorkSize(localSizeCollision_); // Collide and swap on GPU collideAndSwapKernel_.setArg(0, latticesCl_.f0); collideAndSwapKernel_.setArg(1, latticesCl_.f1); collideAndSwapKernel_.setArg(2, latticesCl_.f2); collideAndSwapKernel_.setArg(3, latticesCl_.f3); collideAndSwapKernel_.setArg(4, latticesCl_.f4); collideAndSwapKernel_.setArg(5, latticesCl_.f5); collideAndSwapKernel_.setArg(6, latticesCl_.f6); collideAndSwapKernel_.setArg(7, latticesCl_.f7); collideAndSwapKernel_.setArg(8, latticesCl_.f8); collideAndSwapKernel_.setArg(9, latticesCl_.f9); collideAndSwapKernel_.setArg(10, latticesCl_.f10); collideAndSwapKernel_.setArg(11, latticesCl_.f11); collideAndSwapKernel_.setArg(12, latticesCl_.f12); collideAndSwapKernel_.setArg(13, latticesCl_.f13); collideAndSwapKernel_.setArg(14, latticesCl_.f14); collideAndSwapKernel_.setArg(15, latticesCl_.f15); collideAndSwapKernel_.setArg(16, latticesCl_.f16); collideAndSwapKernel_.setArg(17, latticesCl_.f17); collideAndSwapKernel_.setArg(18, latticesCl_.f18); collideAndSwapKernel_.setArg(19, bodyForcesCl_); collideAndSwapKernel_.setArg(20, 0, sharedMemoryByteSizeCollision_); collideAndSwapKernel_.setArg(21, absPermVariables_.getOmega()); return collideAndSwapKernel_.run(); }