I am working on a numerical solver for solid mechanics problems using linear elasticity. My application calls six kernels within a loop (three integration steps and three post processing steps for file IO). When I run on a CPU, the application behaves as I expect, but when I run on a discrete GPU, the host memory is consumed with each kernel call.
From debugging, the application's memory grows at a rate of 516KB per call to clEnqueueNDRange kernel. Please see the attached image from Windows Task Manager where starting from left to right, a run on the GPU with 3000 kernel calls, a run on the GPU with 30 kernel calls, a run on the GPU with 1500 kernel calls, and a run on the CPU with 3000 kernel calls. My tests show that this value is not a function of the size of the data set, as the problems persists whether the total data set size is around 10KB or above 1MB. Most of the cases that are of interest
required more than 100,000 iterations, with potentially more than the current three post processor kernels.
The main buffers are created using CL_MEM_ALLOC_HOST_PTR and CL_MEM_READ_WRITE while the device buffers are created using only CL_MEM_READ_WRITE. The integrator requires a minimum of two states in memory (linear elasticity depends on values from a previous configuration), meaning there are 30 buffers for state variables. Data is synced across devices using clEnqueueCopyBuffer. All buffers have a size that is a multiple of 256B. All kernels, even the one with the fewest number of arguments (which is 10), cause an increase in memory usage. Also note, the memory is released when the application exits. So far, I am only able to run my simulations using driver 16.1.1.
Is there something I am missing about using the GPU for computation?
OS: Windows 7 64-bit
HOST RAM: 8GB
GPU: R9-280X (8GB RAM)
AMD APP SDK: 220.127.116.11
Radeon Version: 16.1.1
OpenCL Version: 18.104.22.16808
Message was edited by: Ben Claus -Included OS in specs -Added information from tests regarding driver version
Message was edited by: Ben Claus -Clarified a few sentences to help indicate the problem
OpenCL_CPU_GPU_comparison3a.PNG 66.5 KB