VLIW on Cypress and vector addition

Hi to everybody.
I'm thinking about VLIW utilization on a 5870 HD.

Suppose you have the following kernel:


__kernel void saxpy(const __global float * x, __global float * y, const float a)


          uint guid = get_global_id(0);

                    y[guid] = a * x[guid] + y[guid];



Each work item operates on a single vector element and no vectorization (float4).
Is the compiler still capable of packing instructions to exploit the 4 ALUs of each processing element?

Is there any tool to determine the way instructions are packed into VLIW?


Thank you very much!