I have a silly question about the fastpath and completepath hardware on the ATI GPU. I checked the programming guide, which says the two paths can do both "load and store" operations. However, in the profiler description (also in the paper about the OpenCL profiler published in SIGGRAPH) mentions that these two paths are just for "data written to the global memory". I got confused here and thought there might be two possibilities:
(1) these two paths are able to support both load and store, but the profiler just count the data amount "written to the global memory"
(2) these two paths only support store (write) operation, thus the programming guide has made a mistake on this issue.
I know I might understand some key point here, so any clarification on this is welcom