AnyCPU http://anycpu.org/forum/ 

math: fused multiplyadd http://anycpu.org/forum/viewtopic.php?f=17&t=734 
Page 1 of 1 
Author:  robfinch [ Mon Jun 01, 2020 2:44 am ] 
Post subject:  math: fused multiplyadd 
I got to thinking about macro instruction fusion for purposes of performing a dotproduct operation. Then I started wondering why have the fused instructions? Why not just have instructions that output doubled precision results (fmul.ssd) ? Or take doubled precision inputs (fadd.dds) ? An fma retains all the product bits out to doubled precision for rounding which are included in the add operation. An fmul instruction with double precision output would be able to retain the same bits. If this was fed into an fadd instruction that accepted double precision inputs wouldn’t the results be the same? For instance, fmul.s produces 2x24 bit significand or 48 product bits. This result should fit within the 52 significand bits of a double precision number without needing any rounding. There are only a couple of things that an fma instruction provides: 1) slightly increased code density (provided 3 registers read ports are available). 2) more accurate rounding results. 
Author:  BigEd [ Mon Jun 01, 2020 6:46 am ] 
Post subject:  Re: math: fused multiplyadd 
Interesting. Modern descriptions of FMA highlight the improved accuracy realised by not doublerounding. But the introductory material in this thesis ("FloatingPoint Fused MultiplyAdd Architectures" by Eric Charles Quinnell) has IBM claiming: Quote: ...benefits of combining the floatingpoint adder and floatingpoint multiplier into a single functional unit. First, the latency for a multiplyadd fused mathematical operation is reduced significantly by having an addition combined with a multiplication in hardware. Second, the precision of the final result is increased, since the operands only go through a single rounding stage. Third, there is a decrease in the number of required input/output ports to the register file and their controlling subunits. Finally, a reduced area of both the floatingpoint adder and floatingpoint multiplier may be realized since the adder is only wired to the output connections of the multiplier. There seems to be some historical toandfro depending on whether FMA is an extra functional unit or one which replaces M and A. Quote: Even though the fused multiplyadd architecture has troublesome latencies, high power consumption, and a performance degradation with singleinstruction execution, it may be fully expected that more and more x87 designs will find floatingpoint fused multiplyadd units in their silicon. There are some great diagrams in the early parts of that thesis. Why not replace FMA with doubled results? Conventionally, I would expect the power, the area, and the time to count against: for the desired extra one or two bits of accuracy (surely FMA doesn't offer more than that) doubling precision is going to incur a major cost. However, FPGAs may lead to an unconventional answer: the transistors may already be there and unused; the timing may depend on other parts of the design, or be dominated by routing costs; the power budget may be unimportant. 
Author:  BigEd [ Mon Jun 01, 2020 11:57 am ] 
Post subject:  Re: math: fused multiplyadd 
BigEd wrote: ... extra one or two bits of accuracy (surely FMA doesn't offer more than that) ... A foolish thing for me to say! See for example Accurately computing a 2×2 determinant where we see some code, and the results: Naive: 7.03944087021569e07 Kahan: 7.03944088015194e07 where we seem to get some 20 bits of improvement 
Page 1 of 1  All times are UTC 
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ 