We have a Linux (3.2) server with two Radeon 7970 and Catalyst 12.10. OpenCL takes more than a second to start. Basically the first OpenCL call, usually clGetPlatformIDs() takes 1.25s. All the following OpenCL calls in the same process are OK. If I start another process later, same behavior.
clinfo gets the same overheads.
If I change the COMPUTE environment variable to use a single GPU. The overhead goes down to about 1.00s. If I ignore both GPUs, the overhead disappears. It looks like the driver spends a lot of time initializing GPU device things (especially for the first GPU) on the first OpenCL call.
Is there a way to improve this? On NVIDIA, they have a "persistent" mode to make sure that the driver doesn't have to reinit the devices everytime. Is there anything similar for Catalyst? Actually, there's a X server running on the machine so the GPU isn't idle between my OpenCL processes (the NVIDIA persistent mode isn't needed if a X server is running).
FWIW, I am porting hwloc over OpenCL and I would rather not waste 1 second when only listing the server topology.