cancel
Showing results for 
Search instead for 
Did you mean: 

Processors

FMA4 support

I am a developer and I noticed some comments on a few sites that AMD does not support FMA4 properly. Not sure if the CPU microcode has been patched and if that works now.

FMA4 is a fused multiply add instruction. FMA4 supports d = (a * b) + c type of instruction but FMA3 does it differently using only 3 registers. This makes it more tricky as CPUID does not help me select which instruction to use.

I have an R5 2400G which is very recent so I expect it  to more or less support the modern instruction set.

I have Visual Studio 2019 which supports everything out there but I do not want my CPU to "blow up" on me.

Example of what I mean, use the following optimizations: 

/O1 /arch:AVX2 /fp:fast

float mul_add(float a, float b, float c) {
return a*b + c;
}

__m256 mul_addv(__m256 a, __m256 b, __m256 c) {
return _mm256_add_ps(_mm256_mul_ps(a, b), c);
}
0 Likes
7 Replies

Here are all the Ryzen 5 2400G Processor codes from CPU World:

According to Wikipedia on FMA, your processor does support FMA4: FMA instruction set - Wikipedia 

0 Likes

I have run some tests with both conventional code and with the CPU instruction and so far no problems but I am not sure of AMD has fixed the CPU or not?

FMA4 is easiest to use as it uses 4 registers instead of the accumulator which saves a few clock cycles.

AVX2 is based on FMA3 so this does not clarify support.

I have also inquired with Microsoft about compiler support which should be fine but the run-time may not have full support for processors that do not have intrinsic support.

0 Likes
misterj
Big Boss

Correct, hardcoregames.  Use CPU-Z to see:

pastedImage_1.jpg

Enjoy, John.

0 Likes

Given that Intel and AMD both do not have a bit set in _CPUID there is not way to tell if a given processor can do it

https://www.agner.org/optimize/microarchitecture.pdf

is one of several manuals I have consulted

now the AMD manual suggests Bulldozer and Piledriver support FMA4

http://developer.amd.com/wordpress/media/2012/10/New-Bulldozer-and-Piledriver-Instructions.pdf

0 Likes

hardcoregames, looks like only Piledriver.  But this is also shown:

"Support for FMA4 is indicated by the value in bit 16 in ECX when calling CPUID function 0x8000_0001."

Enjoy, John.

0 Likes

I have seen some conflicting comments.

I have done some crude testing with Visual Studio 2019 in 64-bit mode and so far there has been no problem with some random numbers and comparing results.

I was hoping AMD was able to handle vector norms fast so that physics in games etc would run efficiently

fused multiply add is has been around for a very long time, it's part of the IEE754 standard even

__m256 mul_addv(__m256 a, __m256 b, __m256 c) {
return _mm256_add_ps(_mm256_mul_ps(a, b), c);
}

in one form or another

0 Likes