The title says all, is there any sort instruction for vector types (e.g., int4, int8, int16) to sort its components? As I use a great number of vectors of dimensions like 4, 8, 16 (each vector being treated in a work-item), I was wondering if, for example, sorting an int16 vector variable using hardware implemented instructions wouldn't be faster than sorting an array of scalar type (int array) in a single thread (work-item) using algorithms like qsort. If not, would it have a low cost implementation, or even, would it worth try?
Sorting vector types is meaningless because vectors don't really have a defined order except for the point at which they are read from an input buffer. It's the same reason that disallows taking the address of vector elements (which has the unfortunate knockon effect of disallowing atomics on vector elements) and that means that array addressing of vectors is not in spec.
For such a short array you can probably work out an optimal sort algorithm entirely in registers rather than using a standard sort in memory. Some sort of comparison tree, possibly.