Hi melonakos,
I wrote my application on my AMD GPU but it still does not perform successfully (the aim is biting the same application running with SIMD+OpenMP in terms of time consumption). So recently I gave a look to ArrayFire, and maybe could be an option. Honestly, I have some problems understanding how to merge it with OpenCL, in the sense of creating context, adding devices, memory objs, executing kernels and so on (since this is the way I learned for communicating with an external device and adding work). Is there an available guide/examples to drive the user step by step configuring the environment and adding kernels to the device, showing on a side how it could be written using only OpenCL and on the other side using OpenCL+ArrayFire?
Thank you for the attention.
Regards,
Marco