Sorting very small array (< 16)  with a single thread

I am looking for suggestions to resolve efficiently a sorting problem. I have to sort one  very small array per work item (thread)  (< 16 elements).


Maybe it is possible to save the array in a float16 and do it efficiently only using registers, but what kind of sequencial algorithm would fit in that case ?