is there any hard implemented instruction for this?
The title says all, is there any sort instruction for vector types (e.g., int4, int8, int16) to sort its components? As I use a great number of vectors of dimensions like 4, 8, 16 (each vector being treated in a work-item), I was wondering if, for example, sorting an int16 vector variable using hardware implemented instructions wouldn't be faster than sorting an array of scalar type (int array[16]) in a single thread (work-item) using algorithms like qsort. If not, would it have a low cost implementation, or even, would it worth try?