Last visit was: Sat Aug 15, 2020 5:11 am It is currently Sat Aug 15, 2020 5:11 am

 Page 1 of 1 [ 3 posts ]
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1140
I got to thinking about macro instruction fusion for purposes of performing a dot-product operation. Then I started wondering why have the fused instructions? Why not just have instructions that output doubled precision results (fmul.ssd) ? Or take doubled precision inputs (fadd.dds) ? An fma retains all the product bits out to doubled precision for rounding which are included in the add operation. An fmul instruction with double precision output would be able to retain the same bits. If this was fed into an fadd instruction that accepted double precision inputs wouldn’t the results be the same? For instance, fmul.s produces 2x24 bit significand or 48 product bits. This result should fit within the 52 significand bits of a double precision number without needing any rounding.

There are only a couple of things that an fma instruction provides: 1) slightly increased code density (provided 3 registers read ports are available). 2) more accurate rounding results.

_________________
Robert Finch http://www.finitron.ca

Mon Jun 01, 2020 2:44 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1431
Interesting. Modern descriptions of FMA highlight the improved accuracy realised by not double-rounding. But the introductory material in this thesis ("Floating-Point Fused Multiply-Add Architectures" by Eric Charles Quinnell) has IBM claiming:
Quote:
...benefits of combining the floating-point adder and floating-point multiplier into a single functional unit. First, the latency for a multiply-add fused mathematical operation is reduced significantly by having an addition combined with a multiplication in hardware. Second, the precision of the final result is increased, since the operands only go through a single rounding stage. Third, there is a decrease in the number of required input/output ports to the register file and their controlling sub-units. Finally, a reduced area of both the floating-point adder and floating-point multiplier may be realized since the adder is only wired to the output connections of the multiplier.

There seems to be some historical to-and-fro depending on whether FMA is an extra functional unit or one which replaces M and A.

Quote:
Even though the fused multiply-add architecture has troublesome latencies, high power consumption, and a performance degradation with single-instruction execution, it may be fully expected that more and more x87 designs will find floating-point fused multiply-add units in their silicon.

There are some great diagrams in the early parts of that thesis.

Why not replace FMA with doubled results? Conventionally, I would expect the power, the area, and the time to count against: for the desired extra one or two bits of accuracy (surely FMA doesn't offer more than that) doubling precision is going to incur a major cost. However, FPGAs may lead to an unconventional answer: the transistors may already be there and unused; the timing may depend on other parts of the design, or be dominated by routing costs; the power budget may be unimportant.

Mon Jun 01, 2020 6:46 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1431
BigEd wrote:
... extra one or two bits of accuracy (surely FMA doesn't offer more than that) ...

A foolish thing for me to say! See for example
Accurately computing a 2×2 determinant
where we see some code, and the results:
Naive: -7.03944087021569e-07
Kahan: -7.03944088015194e-07
where we seem to get some 20 bits of improvement

Mon Jun 01, 2020 11:57 am
 Page 1 of 1 [ 3 posts ]

#### Who is online

Users browsing this forum: CCBot and 0 guests

 You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum

 Jump to:  Select a forum ------------------ General Discussions Newbies Software    General programming    Languages and tools    Kernels and operating systems Hardware    Hardware in general    CPU/MCU choices and designs    Implementation and Construction Programmable logic Simulation and emulation Nostalgia Projects Anycpu.org