Why CPU is always slower than GPU despite it has more group work size and work item size? In my case the global work size is the same for GPU and CPU, but the local work size is larger for CPU to maximize the CPU utilization. Any explaination about this?
For example if the number of compute units of CPU and GPU is the same, but CPU has larger max work group size. Is it still GPU is faster? Unfortunately I do not have 8 cores or 16 cores CPU, so I can not try.