I am a developer and I noticed some comments on a few sites that AMD does not support FMA4 properly. Not sure if the CPU microcode has been patched and if that works now.
FMA4 is a fused multiply add instruction. FMA4 supports d = (a * b) + c type of instruction but FMA3 does it differently using only 3 registers. This makes it more tricky as CPUID does not help me select which instruction to use.
I have an R5 2400G which is very recent so I expect it to more or less support the modern instruction set.
I have Visual Studio 2019 which supports everything out there but I do not want my CPU to "blow up" on me.
Example of what I mean, use the following optimizations:
/O1 /arch:AVX2 /fp:fast
float mul_add(float a, float b, float c) {
return a*b + c;
}
__m256 mul_addv(__m256 a, __m256 b, __m256 c) {
return _mm256_add_ps(_mm256_mul_ps(a, b), c);
}
Here are all the Ryzen 5 2400G Processor codes from CPU World:
According to Wikipedia on FMA, your processor does support FMA4: FMA instruction set - Wikipedia
I have run some tests with both conventional code and with the CPU instruction and so far no problems but I am not sure of AMD has fixed the CPU or not?
FMA4 is easiest to use as it uses 4 registers instead of the accumulator which saves a few clock cycles.
AVX2 is based on FMA3 so this does not clarify support.
I have also inquired with Microsoft about compiler support which should be fine but the run-time may not have full support for processors that do not have intrinsic support.
Correct, hardcoregames. Use CPU-Z to see:
Enjoy, John.
Given that Intel and AMD both do not have a bit set in _CPUID there is not way to tell if a given processor can do it
https://www.agner.org/optimize/microarchitecture.pdf
is one of several manuals I have consulted
now the AMD manual suggests Bulldozer and Piledriver support FMA4
http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf
hardcoregames, looks like only Piledriver. But this is also shown:
"Support for FMA4 is indicated by the value in bit 16 in ECX when calling CPUID function 0x8000_0001."
Enjoy, John.
I have seen some conflicting comments.
I have done some crude testing with Visual Studio 2019 in 64-bit mode and so far there has been no problem with some random numbers and comparing results.
I was hoping AMD was able to handle vector norms fast so that physics in games etc would run efficiently
fused multiply add is has been around for a very long time, it's part of the IEE754 standard even
__m256 mul_addv(__m256 a, __m256 b, __m256 c) {
return _mm256_add_ps(_mm256_mul_ps(a, b), c);
}
in one form or another