Complex arithmetic on Cypress

Discussion created by mikeaclark on Jul 12, 2010
Latest reply on Jul 21, 2010 by jeff_golds
I'm writing variants of kernels in OpenCL which perform simple linear algebra operations using complex arithmetic.  These are being converted from CUDA kernels, and I'm basically taking a two step approach here:

1. Convert the CUDA to OpenCL without any optimizations

2. Vectorize the computation using float4s to better match the Cyrpess architecture

My question relates to step 2 here.  Can the 4-way vector unit on Cypress handle complex multiplication natively, e.g., multiply two pairs of complex numbers together in the same way that SSE and double-hummer FPU on BG/P can, or are extra operations required (inserting the minus 1 signs as approriate)?