I've got a workstation (Xeon E5-1650 v2, 64GB RAM, Firepro W7000) and use it for OpenCL coding. Now, according to the specs, the GPU should be way faster than the CPU for highly parallelizable tasks. However, the CPU performs about 20-30 percent better than the GPU in cases where there are on the order of 1k double precision floating point operations in a kernel (of course care is taken to ensure that memory access is sequential and that work items does not try to access the same memory all at once).
I've also noted that OpenCL reports less available memory for the GPU than the 4GB it is supposed to have. Here is the log from the OpenCL initialization:
Platform initialization:
num platform ids: 1
platform version: OpenCL 1.2 AMD-APP (1411.4)
platform profile: FULL_PROFILE
platform name: AMD Accelerated Parallel Processing
platform vendor: Advanced Micro Devices, Inc.
available devices: 2
Device 0; Firepro W7000:
device address space: 32
max work item sizes: 256, 256, 256
device profile: FULL_PROFILE
OpenCL driver version: 1411.4 (VM)
device version: OpenCL 1.2 AMD-APP (1411.4)
device memory size: 3072 MiB
device compute units: 20
max work group size: 256
Devince 1; Xeon E5-1650:
device address space: 64
max work item sizes: 1024, 1024, 1024
device profile: FULL_PROFILE
OpenCL driver version: 1411.4 (sse2,avx)
device version: OpenCL 1.2 AMD-APP (1411.4)
device memory size: 65459 MiB
device compute units: 12
max work group size: 1024
So does anyone know what the issue might be here?