I've tried to convert my kernel to vector types instead of manual loop unrolling, and I was pain in the *****. (Finally I gave up, mainly because I have alot of constants, see #1)
Here is what may make dev's life much nicer :-)
1) Allow usage of vector constructors in expressions:
like uint4 res = uint4(2,3,4,5)+uint4(2,3,4,5);
2) Allow vector*scalar operations, like
uint4 res = uint4(2,3,4,5)*7;
3) add int5, uint5, float5 e.t.c, or make it template class, if you do not want to make code HW-dependant. Some might make use of 128-elements vectors and more.