Does this mean that some kernel invocations are taking up much more time than others?
Is it a CodeXL Issue or do you see such performance drops while actually running the kernel in your application?
Probably you can share a complete testcase, which can be compiled at our end.
I’m not sure if or not what CodeXL reported Kernel time includes Kernel invocation time + Kernel execution time.
If it does, then I’ll focus on Kernel invocation time and maybe it’s OCL runtime problem. If it doesn’t, I will be totally confused.
The testcase is isolated from a real video post-processing application which sometimes can’t run very smoothly. The ISV located the root cause then write a test case to duplicate it. I attached the test case.
Another weird issue is: for some cards, actually only on HD 6670, the test case gave a shorter execution time with 14401080 (2.25x enlarged from a 640480 frame) frames while 1280960 (2x enlarged from the same 640480 frame) frames takes longer time.
Thanks for your help.
SelfExampleScale.7z.zip 2.8 MB
Well a few issues I am having here:
1. It is a chinese project. So hard to understand comments, and read-me files.
2. The project is not compiling for me as of now. I am trying it in VS12 Ultimate. And it is giving me mfc100d.lib not found error. Probably some of the libraries used in the project were built using older VS.
It will be helpful if you can send a minimal repro-case, without such dependencies. Is it must for me to compile your code VS10?