Hi,
I'm developing an OpenCL application that assembles a lot of arbitrary kernels at runtime (via genetic programming). Is there a way to build OpenCL kernels in parallel (on the CPU) using, preferably, OpenCL intrinsics? By parallel I mean many kernels concurrently.
try call clBuildProgram() in multiple threads.
Would it be possible to use a native kernel instead?
I had problems building kernels on separate threads - apparently the compiler isn't actually thread safe. Mind you, this was a while back, but still...
The thing that worked for me was to spawn off separate processes, one per CPU core, and send them stuff to compile over pipelines.
Originally posted by: keldor314 I had problems building kernels on separate threads - apparently the compiler isn't actually thread safe. Mind you, this was a while back, but still...
According to the OpenCL 1.1 Specification (Section A.2), all API calls should be thread-safe, except clSetKernelArg. Were you using the 1.1 spec?
Hi d.a.a,
Does multithreaded clBuildProgram work for you?
And using native kernels you may be able to build them parallely, but running them might be problematic as IIRC ,GPUs don't support it yet.
Originally posted by: himanshu.gautam Hi d.a.a,
Does multithreaded clBuildProgram work for you?
I'm investigating the native-kernel way of doing it. I'd like to use OpenCL intrinsics, otherwise I would need to use a (portable) third-party library for multi-threading execution.
And using native kernels you may be able to build them parallely, but running them might be problematic as IIRC ,GPUs don't support it yet.
Sorry, I don't get it. What exactly GPUs don't support?
native kernel is just another function call. and it doesn't guarantee that it got executed in parallel. implementation can serialize execution of native kernels.
you should just use some threading lib like pthread, boost::thread or with new C++11 std::thread
GPU will never support native kernels as it must be executed on host CPU.
Originally posted by: nou native kernel is just another function call. and it doesn't guarantee that it got executed in parallel. implementation can serialize execution of native kernels.
you should just use some threading lib like pthread, boost::thread or with new C++11 std::thread
GPU will never support native kernels as it must be executed on host CPU.
you must create multiple queues oherwise it will be serialized as current AMD APP don't support out of order queue. also even with multiple queues it is not guarantied that it will be executed in parralel. and it will be cumbersome native kernels are not intended to create threads in your program.