AnsweredAssumed Answered

Memory considerations when enqueing a long sequence of kernels and reads

Question asked by kholdstare on Jan 15, 2013
Latest reply on Jan 17, 2013 by himanshu.gautam

Hi! I have posted this question on Stack overflow, and I thought I would post it here too, since I am using AMD's SDK for OpenCL development, and the solution could be implementation defined.

 

http://stackoverflow.com/questions/14351261/memory-considerations-when-enqueing-a-long-sequence-of-kernels-and-reads

 

You can read the full question above, but I will summarize here. Given a pipeline of kernel operations like:

data -> kernel1 -> data1 -> kernel2 -> data2 -> kernel3 -> data3 etc. 

I need all the intermediate results to be copied back to the host as well. I want to make everything as asynchronous as possible by specifying the minimal event dependencies (so reads only depend previous kernel execution, and kernels don't care about reads).

 

I have a few questions about managing the memory objects:

  • Do I have to keep references to all cl_mem objects in the long chain of actions and release them after everything is complete?
  • Importantly, how does OpenCL handle the case when the sum of all memory objects exceeds that of the total memory available on the device? At any point a kernel only needs the input and output kernels (which will fit in memory), but what if 4 or 5 of these buffers exceed the total, how does OpenCL allocate/deallocate these memory objects behind the scenes? How does this affect the reads/DMAs?

So the general question is, how do large task trees interact with large memory objects?

 

I would be grateful if someone could clarify what happens in these situations, and perhaps there is something relevant to this in the OpenCL spec.

 

Thank you.

Outcomes