PC Processors

hardcoregames_ · ‎06-16-2019

I am a developer and I noticed some comments on a few sites that AMD does not support FMA4 properly. Not sure if the CPU microcode has been patched and if that works now.

FMA4 is a fused multiply add instruction. FMA4 supports d = (a * b) + c type of instruction but FMA3 does it differently using only 3 registers. This makes it more tricky as CPUID does not help me select which instruction to use.

I have an R5 2400G which is very recent so I expect it to more or less support the modern instruction set.

I have Visual Studio 2019 which supports everything out there but I do not want my CPU to "blow up" on me.

Example of what I mean, use the following optimizations:

/O1 /arch:AVX2 /fp:fast

float mul_add(float a, float b, float c) {
 return a*b + c;
}

__m256 mul_addv(__m256 a, __m256 b, __m256 c) {
 return _mm256_add_ps(_mm256_mul_ps(a, b), c);
}

elstaci · ‎06-17-2019

Here are all the Ryzen 5 2400G Processor codes from CPU World:

According to Wikipedia on FMA, your processor does support FMA4: FMA instruction set - Wikipedia

hardcoregames_ · ‎06-17-2019

I have run some tests with both conventional code and with the CPU instruction and so far no problems but I am not sure of AMD has fixed the CPU or not?

FMA4 is easiest to use as it uses 4 registers instead of the accumulator which saves a few clock cycles.

AVX2 is based on FMA3 so this does not clarify support.

hardcoregames_ · ‎06-17-2019

I have also inquired with Microsoft about compiler support which should be fine but the run-time may not have full support for processors that do not have intrinsic support.

misterj · ‎06-17-2019

Correct, hardcoregames. Use CPU-Z to see:

Enjoy, John.

hardcoregames_ · ‎06-17-2019

Given that Intel and AMD both do not have a bit set in _CPUID there is not way to tell if a given processor can do it

https://www.agner.org/optimize/microarchitecture.pdf

is one of several manuals I have consulted

now the AMD manual suggests Bulldozer and Piledriver support FMA4

http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf

misterj · ‎06-18-2019

hardcoregames, looks like only Piledriver. But this is also shown:

"Support for FMA4 is indicated by the value in bit 16 in ECX when calling CPUID function 0x8000_0001."

Enjoy, John.

hardcoregames_ · ‎06-18-2019

I have seen some conflicting comments.

I have done some crude testing with Visual Studio 2019 in 64-bit mode and so far there has been no problem with some random numbers and comparing results.

I was hoping AMD was able to handle vector norms fast so that physics in games etc would run efficiently

fused multiply add is has been around for a very long time, it's part of the IEE754 standard even

__m256 mul_addv(__m256 a, __m256 b, __m256 c) {
 return _mm256_add_ps(_mm256_mul_ps(a, b), c);
}

in one form or another