In OpenCL targeted for APU,
1) In APU, graphics core shares the main memory ( Instead of VRAM)
Is it really required to do buffer copy and usage of local memory and global memory ( Except for synchornization)
Can't we just use host_ptr, and mapped memory after all it resides in main memory.
In APU, Which memory ( Is there something like cache) is used for 32 K memory ?
2) Is there a sample for APU ?
OpenCL Targeted for CPU,
1) I thought CPU workgroup size should in the order of 1, as CPU cores are not Stream processors ( I am talking about the warp size)
But, when queried for max workgroup using clGetDeviceInfo, it gives 1024
What is the best practise of workgroup size for CPU ( as similar to AMD GPU 64)