when testing our closed source OpenCL code, we have noticed that kernels are slower when compiled into SPIR. Same kernels in same application run faster when compiled directly from source. The difference is very significant with some kernels, e.g. more than 100% (source 300 ms, SPIR 650 ms, tested multiple times with R9 290X, R7 260X and W9100 GPUs on Windows and Linux machines). Our developers believe that the problem is related to global memory accesses, so we have stripped down one of our kernels to bare minimum to show the problem (kernel source, SPIR binary and test app are attached). The attached demo shows only small slowdown (about 10 %), but it can be clearly measured.
Kernel binary was generated using Khronos-modified clang with the following parameters:
-cc1 -x cl -emit-llvm-bc -triple spir-unknown-unknown -O3 -cl-fast-relaxed-math -cl-spir-compile-options "-cl-fast-relaxed-math" -include opencl_spir.h
SPIR header file opencl_spir.h can be found here.
Both ways work correctly, just the SPIR is significantly slower. Can anyone confirm this please? Are we doing something wrong?
Thanks for any suggestions,
Message was edited by: Martin Jirman (attached the files)
EDIT 2: renamed kernel.bin to kernel.h