
Complex arithmetic on Cypress
MicahVillmow Jul 12, 2010 4:18 PM (in response to mikeaclark)The complex data type is not natively supported in either OpenCL or our hardware.
Complex arithmetic on Cypress
mikeaclark Jul 13, 2010 11:33 AM (in response to MicahVillmow)Thanks for the update Micah. Some more questions:
Since Cypress doesn't support complex multiplication natively, do you have any suggestions on how best to implement complex arithmetic? My first thought was to write all operations in terms of 4 complex numbers, i.e., assign a float4 for real and a float4 for imaginary, this would enable maximum throughput. Unfortunately, this doesn't always map well to my problem. The alternative would seem to be write everything as floats, and let the compiler do its best, or to use float4s for two complex numbers, and perform the requisite twiddling on the components.
Is there a way for the AMD CPU OpenCL to use the complex SSE instructions? Is the compiler able to detect such sequences and issue the correct SSE instruction, or will this require true complex data type support in OpenCL 1.x ?
I'm sure you know the complex types are reserved in the current spec. Does anyone know if their implementation is on the OpenCL roadmap?
I realise you can't answer this, but any chance of native complex support in southern islands? :)


Complex arithmetic on Cypress
MicahVillmow Jul 13, 2010 1:55 PM (in response to mikeaclark)mikeaclark,
Why not just use a float8 where float8.hi are real and float8.lo are complex? Or you can interleave them and use float8.odd as real and float8.even as complex.
Also I can't answer your final few questions obviously.
Complex arithmetic on Cypress
mikeaclark Jul 21, 2010 12:02 PM (in response to MicahVillmow)Thanks for your reply. Another question.
Does Cypress support fusedmultiplysubtract on the fourwide vector unit? This would allow me to write vectorized complex arithmetic as two float4s (for real and imaginary) without penalty.
I note that fusedmultiplysubtract is not supported on Fermi, in fact, the lack of a fms instruction is what prevents my kernel (complexvalued outer product sum) from exceeding 1 Tflop/s (short at 950 Gflop/s). Support for fms on Cypress would be a big plus in my book.

Complex arithmetic on Cypress
jeff_golds Jul 21, 2010 5:26 PM (in response to mikeaclark)FMS = FMA: result = a*b  c = a*b + (c).
The HW has some input modifiers that can be used to optimize certain common operations. You can read more in the ISA spec.
Jeff

