Last visit was: Sat Aug 15, 2020 5:11 am
It is currently Sat Aug 15, 2020 5:11 am

 [ 3 posts ] 
 math: fused multiply-add 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1140
Location: Canada
I got to thinking about macro instruction fusion for purposes of performing a dot-product operation. Then I started wondering why have the fused instructions? Why not just have instructions that output doubled precision results (fmul.ssd) ? Or take doubled precision inputs ( ? An fma retains all the product bits out to doubled precision for rounding which are included in the add operation. An fmul instruction with double precision output would be able to retain the same bits. If this was fed into an fadd instruction that accepted double precision inputs wouldn’t the results be the same? For instance, fmul.s produces 2x24 bit significand or 48 product bits. This result should fit within the 52 significand bits of a double precision number without needing any rounding.

There are only a couple of things that an fma instruction provides: 1) slightly increased code density (provided 3 registers read ports are available). 2) more accurate rounding results.

Robert Finch

Mon Jun 01, 2020 2:44 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1431
Interesting. Modern descriptions of FMA highlight the improved accuracy realised by not double-rounding. But the introductory material in this thesis ("Floating-Point Fused Multiply-Add Architectures" by Eric Charles Quinnell) has IBM claiming:
...benefits of combining the floating-point adder and floating-point multiplier into a single functional unit. First, the latency for a multiply-add fused mathematical operation is reduced significantly by having an addition combined with a multiplication in hardware. Second, the precision of the final result is increased, since the operands only go through a single rounding stage. Third, there is a decrease in the number of required input/output ports to the register file and their controlling sub-units. Finally, a reduced area of both the floating-point adder and floating-point multiplier may be realized since the adder is only wired to the output connections of the multiplier.

There seems to be some historical to-and-fro depending on whether FMA is an extra functional unit or one which replaces M and A.

Even though the fused multiply-add architecture has troublesome latencies, high power consumption, and a performance degradation with single-instruction execution, it may be fully expected that more and more x87 designs will find floating-point fused multiply-add units in their silicon.

There are some great diagrams in the early parts of that thesis.

Why not replace FMA with doubled results? Conventionally, I would expect the power, the area, and the time to count against: for the desired extra one or two bits of accuracy (surely FMA doesn't offer more than that) doubling precision is going to incur a major cost. However, FPGAs may lead to an unconventional answer: the transistors may already be there and unused; the timing may depend on other parts of the design, or be dominated by routing costs; the power budget may be unimportant.

Mon Jun 01, 2020 6:46 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1431
BigEd wrote:
... extra one or two bits of accuracy (surely FMA doesn't offer more than that) ...

A foolish thing for me to say! See for example
Accurately computing a 2×2 determinant
where we see some code, and the results:
Naive: -7.03944087021569e-07
Kahan: -7.03944088015194e-07
where we seem to get some 20 bits of improvement

Mon Jun 01, 2020 11:57 am
 [ 3 posts ] 

Who is online

Users browsing this forum: CCBot and 0 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software