I have an opencl wrapper that can act as a server or a client and uses all devices per pc.
But I have only 1 pc.
When I increase number of server instances in same pc to 9 or greater, it bugs at the kernel compiling part. If I put compiling part in a lock to elliminate race of something, it works flawlessly and all server instances calculate right.
Is this non-thread-safeness of "kernel compiling" (on different contexts in same windows-10 64 bit machine), about process scope? Device scope? Shared memory scope(such as numa)?
Device: R7_240. Cpu could trigger same behaviour but it is so fast compiling compared to gpu, I'm not sure.
Opencl 1.2 bindings used in 64-bit C# program.