Archives Discussions

Meteorhead · ‎05-25-2011

any way?

I have seen that SiSoft Sandra has built-in tests for double precision on GPU even on cards that do not support it. It most likely uses some OpenCL library that does 64 bit calculations on 32-bit hardware.

Am I mistaken that this sort of behavior could be achieved if only the very basic operations (ADD, MUL, FMAD...) and the likes are implemented on ISA level and let everything automatically build on top of that with already existing software? Compute power drops about 10X on my Mobility 5870, which is still quite good.

Would AMD implement pseudo DP on all their cards to enable developers to test DP applications on their mobile workstations as well? It is a really big pain to have to implement everything twice, #define REAL float/double everywhere and then use sizeof(REAL) everywhere. I would much more welcome some standard way of having DP on all HW.

This could be some vendorspecific extension.

(Edit: I just figured this would fit better into another topic. I'll summarize there also)

galmok · ‎05-25-2011

Or perhaps do what nVidia has done. Use float calculations as fallback. Double's are specified and 8 bytes (1 double) per value is transferred, but only 4 bytes (1 float) are used for calculations.

LeeHowes · ‎05-26-2011

That doesn't sound practical. It would mean that the double precision calculations would fail double precision precision requirements.

Meteorhead · ‎05-26-2011

Could someone with relevant knowledge explain whether this is possible? To implement only the very basics and let everything automagically happen on top of it? Or is it something more complicated?

I do not know how it is implemented in the SiSoft test suite (it says emulated DP, and there is a major performance drop, so most likely it is truly DP), but it would really rock to have something like this. I believe this is something that has to be implemented once for the VLIW-5 and the VLIW-4 cards and everybody would be really happy.

galmok · ‎05-26-2011

Originally posted by: LeeHowes That doesn't sound practical. It would mean that the double precision calculations would fail double precision precision requirements.

Well, doing the double-as-float trick will make it possible to develop and test (some) of the kernel functionality on a laptop. AMD doesn't have any mobile chipset with double precision capacity and here the trick would be welcome.

I already use it for nVidia's OpenCL where I patch the binary (which is a ptx file) to enable the double-as-float option on my laptop (nVidia GPU with only float support). Sure, it fails the precision test, but I can still see if the results are in the right ballpark (which most likely means the kernel would work on DP-capable hardware).

This feature is not for actual production use of course.

LeeHowes · ‎05-26-2011

Yes, as a test for you it makes sense (if your maths is not likely to be chaotic). I meant it doesn't make sense for us to implement a double "emulation" mode in that fashion.

There's lots of information out there on double emulation with a bit of googling. It's always rather slow of course.

Meteorhead · ‎05-26-2011

I was talking about something like this:

http://oscarbg.blogspot.com/2009/10/double-precision-support-in-gpu.html

I'm not a mathematician, nor can I make the compiler accept type "double" if it does not support either cl_khr_fo64 or cl_amd_fp64. Couldn't next SDK include a vendorspecific extension that utilizes functions like the ones that are mentioned in the blog above? It would be so much nicer if this would be done centrally, and not everyone hack something. It even shows timer differences, and it's only a bit more than twice as slow as 'native' DP on Radeon 5850. That would really rock to have this type of DP on all AMD HW.

I believe it's not that much of an effort, but yet again, I only have rough ideas how hard this could be to implement. I believe things intelligently come on top of this automatically.

LeeHowes · ‎05-27-2011

In principle it's a good idea and I'm sure we'll consider it. It will of course depend on priorities and hardware roadmaps. It's likely more complicated than we would guess, too, because testing the full set of double precision functions on top of emulated basic ops might be complicated to do well.

It's not my area any more than it is yours and of course even if I knew the answer I couldn't give away plans like that

Archives Discussions

pseudo double precision