I have a few questions about kernel instruction flow.
(1) Is a kernel's binary code uploaded to GPU when clEnqueueNDRangeKernel() is executed?
(2) Is it stored in global memory?
(3) Are there special channels/caches for instruction flow to speed it up?
(4) How long a kenel stays on the GPU?
(4) If I invoke a kernel repeatedly, will the binary code be uploaded via the system bus repeatedly?
(5) FetchSize is an important figure summarized by the APP profiler, which shows the total kilobytes fetched from the video memory. Does FetchSize take into account the instruction flow? Or does it only reflect data flow?
Thank you very much in advance.