.

I found an interesting paper: "A survey of hardware designs for decimal arithmetic" by Wang et al, 2010, 15 pages.

You can

find it online using DOI: 10.1147/JRD.2010.2040930

It's interesting in part because of numerous historical call-backs...

Quote:

Processor support for decimal arithmetic has existed since the emergence of commercial computers. Early computers, including the ENIAC, UNIVAC, and IBM 650, used decimal arithmetic, because decimal computations mirror manual computations. These processors only perform DXP arithmetic in hardware. Early binary computers such as the IBM S/360 mainframe defined DXP formats that are still implemented today in the IBM zSeries machines. An early processor to provide dedicated hardware support for variable-precision DFP arithmetic is CADAC.

Quote:

Another processor, called BAP-SC, [from 1987] also provides dedicated hardware for DFP arithmetic. BAP-SC is an accumulator-based processor that has DFP arithmetic instructions with most DFP operations implemented using microcode. In addition to basic arithmetic operations, BAP-SC supports the exponential function, the natural logarithm, and trigonometric functions. In BAP-SC, a DFP number is 64 bits, with a 13-digit BCD significand, an 8-bit biased binary exponent, three status bits, and one sign bit.

Quote:

IBM continued support for decimal numbers from early decimal machines to binary machines. The IBM S/360 architecture and machines support DXP formats of variable length up to 31 digits plus a sign digit in a BCD format. Operations are directly performed on data from memory and returned to memory.

Quote:

The IBM S/390 microprocessors from the 1997 G4 to the 1999 G6 have a 64-bit binary adder but use a 32-bit decimal adder to perform DXP add, subtract, and compare operations in hardware. This decimal adder uses the adder design from the IBM S/360 model 195 to directly generate a decimal sum without any correction.

The G4 microprocessor uses millicode to perform other decimal instructions such as multiply and divide with no hardware assists other than addition. G5 and G microprocessors have dedicated decimal multipliers that use lookup tables to generate partial products

Quote:

Many general-purpose microprocessors have hardware assists to help perform fixed-point BCD addition. For example, IA-32 and IA-64 use two-digit assist instructions, and the IBM Power Architecture uses an eight-digit assist instruction

Quote:

Various methods have been developed to accelerate BCD addition and subtraction by avoiding the pre- and postcorrection steps. Schmookler and Weinberger propose a fixed-point decimal carry-lookahead addition scheme that is similar to its binary counterpart ... This approach is used in the IBM S/360 Model 195

Quote:

Decimal multiplication is more complicated than binary multiplication due to the need to handle carries across both digit and bit boundaries, produce several multiples of the multiplicand, and perform correction of product digits. To handle this increased complexity, some decimal multipliers use sequential techniques to accumulate partial products, whereas others add all the partial products and then convert the results back to BCD.

As for motivation, they offer

"one study estimates that a large telephone billing system can accumulate errors of up to $5 million per year, if using BFP arithmetic, rather than decimal floating-point (DFP) arithmetic"

"the performance of certain financial applications can be improved by more than a factor of 5 if DFP hardware is provided in the processor data path."

83 references, and worth a read if you like this sort of thing!

See also the bibliography at Speleotrove:

[list=]

Decimal Arithmetic: Floating-point[/list]