GCN GPUs are strict SIMT.
Decorating a kernel with '__attribute__((vec_type_hint(<type>)))' will not influence GPU compilation artifacts.
Since GCN is a scalar architecture, i.e. each thread can at most execute a single component ALU operation in a cycle there is no point in trying to horizontally-vectorize the code.
Thank you so much.
Just one more question - if I have to do a lot of transcedental functions over different data fields should I create my own ones (using polonomials that are close to the real ones), if I dont' care so much about the error ?
In other words, is it true that only the scalar unit can do transcedental functions ?
It is the opposite, transcendental functions are only supported on the vector engine, not on the scalar engine.
They operate at quarter rate, hence if you want to have a faster approximation you need to do it in less than 4 single precision operations.