Hi J B,
Sorry for the delayed response, and thank you for your patience.
I checked with the team working on the libraries and this is what they had to say:
Very valid question. We are working on runtime and compiler improvements that will help small matrices and reduce the threshold of matrix sizes that would benefit on the GPU vs CPU. However, there will still be smaller matrices for which using the CPU is more beneficial.
Any improvement on what I have now will be most welcome.
Bundles of thanks.