AnsweredAssumed Answered

What is more efficent ....

Question asked by vbb on Aug 25, 2015
Latest reply on Sep 17, 2015 by dipak



I wrote a simple opencl program, where a some kernels are executed in a loop. The loop is shown in the code snippet below. The kernel verletstep1 sets the variable

*verletneedsupdate to true if necessary. This happens around every 100 iterations. If this occurs, a list, called verletlist, must be

updated, which is done by the three kernels erase cells, buildVerlet1 and buildverlet2.  In the solution shown below, in every timestep

memory is mapped from the GPU to host memory.

Alternatively i tried to call the three kernels in the if branch on every iteration

and surround the whole code within the kernels with a if condition, so that the kernels are foing nothin if *verletneddsupdate ist false.

On my Radeon R9 280 this second way is a little bit faster then braching on the host

(but only approx. 2 percent), however, on an intel HD4000 device (using the beignet platform on linux), the solution below is significantly faster as the other.

(but aprrox. 20 times slower as runs on the dedicated Radeon Card using amds app).


Now my questions. Is there a more efficent way for conditionally enqueue kernels, depending from the result of the former kernel as the both ways i used?
If not, wich way is the better way in opencl 1.2.?  


cl::Buffer bufferVerletNeedsUpdate = cl::Buffer(context,

        CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, verletNeedsUpdateSize,verletNeedsUpdate);


for (int i = 0; i < maxTimesteps; ++i) {

            queue.enqueueNDRangeKernel(verletStep1Kernel, cl::NullRange,

                    globalp, localp);


            if (*verletNeedsUpdate) {


                queue.enqueueNDRangeKernel(ereaseCellKernel, cl::NullRange,

                        globalc, localc);

                queue.enqueueNDRangeKernel(buildVerlet1, cl::NullRange, globalp,


                queue.enqueueNDRangeKernel(buildVerlet2, cl::NullRange, globalp,



            queue.enqueueNDRangeKernel(verletStep2, cl::NullRange, globalp,


            if (i % snapshot == 0) {

                std::cout << "Verletlistupdates: " << verletupdates << std::endl;


                if (i > 0) {

                    std::cout << "time " << timestep * i << "snapshot "

                            << *verletNeedsUpdate << std::endl;

                    char filename[16]; // string which will contain the number

                    sprintf(filename, "./data/snap%04d", snapnumber++);

                    saveSnapShot(filename, positions, velocities, accelerations,


                } // Write Data from llast Snapshot to HD

                  // then read the momentary data

                queue.enqueueReadBuffer(bufferPositions, CL_TRUE, 0, datasize,


                queue.enqueueReadBuffer(bufferVelocities, CL_TRUE, 0, datasize,


                queue.enqueueReadBuffer(bufferAccelarations, CL_TRUE, 0,

                        datasize, accelerations);

                queue.enqueueReadBuffer(bufferVerletNeedsUpdate, CL_TRUE, 0,

                        verletNeedsUpdateSize, verletNeedsUpdate);