AnyCPU
http://anycpu.org/forum/

Readings on high-performance CPU designs
http://anycpu.org/forum/viewtopic.php?f=15&t=556
Page 1 of 1

Author:  BigEd [ Sun Dec 16, 2018 6:12 pm ]
Post subject:  Readings on high-performance CPU designs

.
I thought it might be good to have a thread of resources and examples of complex CPUs, even though most homebrew CPUs fall firmly in the category of simple CPUs. (For myself, I'd be happy to invent or implement any working CPU, let alone one which overlaps fetch and execute!)

First off, some resources courtesy of robfinch, from this thread.

A Superscalar Out-of-Order x86 Soft Processor for FPGA (Thesis by Henry Ting-Hei Wong, PDF, 275 pages)

Quote:
14.2 Contributions
This thesis presents the design of a superscalar out-of-order x86 FPGA soft processor. In this thesis, we
make the following contributions:
    • Study how the FPGA substrate affects circuits and out-of-order processor microarchitecture (Chapter
    3)
    • Design a microarchitecture that correctly implements the x86 instruction set sufficiently well to
    boot modern operating systems (Chapters 4 and 5)
    • Design FPGA circuits for key processor components that implement the out-of-order microarchitecture
    (Chapters 6 to 12)
    • Evaluate the performance and costs of an out-of-order soft processor compared to a commercial
    in-order soft processor and x86 hard processors (Chapter 13)


There's a great deal in here and it's very well written. I've only scratched the surface.

Quote:
Contents
1 Introduction
1.1 Why a Faster Single-Threaded Soft Processor?
1.2 Why Out-of-Order?
1.3 Why x86?
1.4 Contributions
1.5 Organization
2 Background
3 Comparing FPGA vs. Custom CMOS Circuits
4 CPU Design Methodology
5 Proposed Processor Microarchitecture
6 Instruction Fetch and Decode
7 Register Renaming
8 Reorder Buffer and Instruction Commit
9 Instruction Scheduling
10 Register Files and Instruction Execution
11 Out-of-order Memory Execution Schemes
12 Memory and Cache System
13 Processor Performance
14 Conclusions
A Floating-Point Emulation Mechanism
B Integer Division Algorithm and Circuit
C Test System Configuration Details
D List of Micro-Ops and Logical Registers
E Comparison with Intel Haswell


Rob also reminds us that the Usenet group comp.arch is still alive and sometimes very interesting. For example:

Linked from that thread, there's a series of university lectures by Prof. Ajit Pal here:

There's also much interesting information about IBM's ill-fated and mismanaged ACS supercomputer from the early 60s. IBM invented and then forgot about Dynamic Instruction Scheduling. By not understanding how the ACS worked, they thought it was not worth producing. Maybe start here:
https://people.cs.clemson.edu/~mark/acs_technical.html
Quote:
Quote:
I built a Precursor that ran at a 10ns cycle. 5 levels of logic. Each chip could dissipate 3 watts and with 625 chips we had to have coolant. We used FC78 as a liquid coolant. The precursor was a path through a 24 bit adder.
-- Bill Mooney, personal correspondence, describing a test of ACS circuitry in 1968


Or maybe start here
https://people.cs.clemson.edu/~mark/acs.html
with particular reference to Lynn Conway's historical PDFs. Her much more recent paper reflecting on ACS and how it all went wrong is also very interesting:

Author:  robfinch [ Thu Jan 10, 2019 9:42 am ]
Post subject:  Re: Readings on high-performance CPU designs

Found this thesis about a "braided" architecture:

https://repositories.lib.utexas.edu/bit ... f71786.pdf

Relies on software to identify "braids" which are subsets of basic-blocks. Workload is distributed among eight braid execution units (BEU's in the core).

The BEU's each have their own register file and there is a global register file as well. So two levels of files. There's the potential for a smaller footprint for processor structures thus maintaining a high performance.

Author:  BigEd [ Thu Jan 10, 2019 10:13 am ]
Post subject:  Re: Readings on high-performance CPU designs

Thanks for the link - that's a very promising title: "Braids: Out-of-Order Performance with Almost In-Order Complexity"

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/