I'm stuck with a CL_INVALID_KERNEL_DEFINITION with clAmdBlas.
In my OpenCL program, I have two platforms, one with 1 CPU and one with 2 GPUs (GPU1 and GPU2).
I'm creating a context for each platform, then the communication time can be 0 between GPU.
Each device has of course it's own queue.
What I'm doing then is to run a clAmdBlasSgemm on each devices. A CL_INVALID_KERNEL_DEFINITION appear when I'm running clAmdBlasSgemm on GPU2, after running the blas function well on CPU and GPU1.
Here's a pseudo for you to understand it better
for (every platforms) // HERE COMES the CPU platform then GPU platform
for (every devices) // for CPU platform, there's only CPU, for GPU platform there's GPU1 then GPU2
launch clAmdBlasSgemm // CL_INVALID_KERNEL_DEFINITION appear when GPU2 comes !
What am I doing wrong ? How can I solve this problem ? Can't we make multiple call to clAmdBlasSgemm for differents devices on the same platform ?
Thanks for you help