Archives Discussions

uvedale · ‎06-14-2012

Hi,

I stumbled upon an optimisation that I don't quite understand, and was hoping somebody could shed some light on it.

I have two OpenCL kernels, model_setup and model_run. The model_run kernel basically does a Monte Carlo type simulation with a large number of different parameter sets. One of the selling points of OpenCL is that you can just give it a huge amount of work, and it will figure out how to schedule it efficiently (if I'm not mistaken?). However, I stumbled upon a sizable performance increase (around 11%) by scheduling multiple executions of the kernels with a smaller NDRange as opposed to scheduling one execution with all the data. The group size was not changed.

Any ideas?

Dale

realhet · ‎06-14-2012

I don't know what card you are using and what memory bandwidth your algo requires, but generally when you raise the number of threads, the following things happen:

More Compute Units became active (if not all are active already) -> speedup
More memory is accessed at a given time, so L2 caching became less effective -> slowdown

I guess you gain more from the first than losing from the second...

View solution in original post

realhet · ‎06-14-2012

I don't know what card you are using and what memory bandwidth your algo requires, but generally when you raise the number of threads, the following things happen:

More Compute Units became active (if not all are active already) -> speedup
More memory is accessed at a given time, so L2 caching became less effective -> slowdown

I guess you gain more from the first than losing from the second...

uvedale · ‎06-14-2012

Aah yes, I see the cache hit is higher on the multiple kernel runs. Thanks for the quick and helpful response!

ganadineroxint · ‎06-15-2012

slowdownI guess you gain more from the first than losing from the second.

atte: ganar dinero por internet

Archives Discussions

Kernel execution time reduced using multiple iterations