There's no quick and easy answer to this question in general. Guessing is not the right thing to do.
Take a look at CodeXL if you haven't already.
Profile your application, then on the profile results, click on the "Kernel Occupancy" value of each call to be presented with various graphs which will allow to better understand what's going on as the work size changes.
agree with maxdz8
Thank you for reply.
I researched that the saturation is caused by Memory Access,
because Random Memory Access is spent 300ms/(100,000,000access),
and clEnqueueNDRangeKernel is spent 420ms/(100,000,000access).
In the case, data processing time depends on not local_work_size
but Memory Access.