We're using the 4670 chip for our openCL programming. We implemented matrix multiplication for incremental values of matrix size and we tried to compare the time taken over the GPU with the time taken over the CPU.
We noticed that, when the GPU code is run. The increase in time is not linear. What is happening is, there is a huge downfall of time at a certain point. Then again the time increases linearly for a while, then again there is a downfall of time.
Is it because, there are multiple cores over the GPU and the no of cores allocated are dependent on the input size? If not, what might be the reason for the sudden downfall?