In a nearby thread we
started talking about branch prediction - see below for quotes.
It turns out
the Wikipedia page is quite good, and in it I found
this article (long PDF - see page 12 onwards) surveying modern branch prediction and these two interesting historical snippets - self-modifying code isn't the half of it, this is machine-modified code:
Quote:
The Burroughs B4900, a microprogrammed COBOL machine released around 1982, was pipelined and used branch prediction. The B4900 branch prediction history state is stored back into the in-memory instructions during program execution. The B4900 implements 4-state branch prediction by using 4 semantically equivalent branch opcodes to represent each branch operator type. The opcode used indicated the history of that particular branch instruction. If the hardware determines that the branch prediction state of a particular branch needs to be updated, it rewrites the opcode with the semantically equivalent opcode that hinted the proper history. This scheme obtains a 93% hit rate. US patent 4,435,756 and others were granted on this scheme.
Quote:
The IBM 7030 Stretch, designed in the late 1950s, pre-executes all unconditional branches and any conditional branches that depended on the index registers. For other conditional branches, the first two production models implemented predict untaken; subsequent models were changed to implement predictions based on the current values of the indicator bits (corresponding to today's condition codes). The Stretch designers had considered static hint bits in the branch instructions early in the project but decided against them. Misprediction recovery was provided by the lookahead unit on Stretch, and part of Stretch's reputation for less-than-stellar performance was blamed on the time required for misprediction recovery. Subsequent IBM large computer designs did not use branch prediction with speculative execution until the IBM 3090 in 1985.
Garth wrote:
robfinch wrote:
I finally understood how the g-share branch predictor could be implemented.
The core uses a PowerPC style branch predictor arrangement. The branch address is fed for instruction fetch from the branch target buffer in the first cycle of a branch instruction fetch. During the next cycle an adaptive correlating branch predictor may override the address from the BTB.
I have not paid much attention to matters of processor design; but I have wondered why not let the programmer tell it if it's likely to branch or not. He always knows the goal of the particular part of the code, and what it's used for, unlike even the smartest compilers. Then the processor could use information from the programmer to prepare the most efficient way to do what he says will usually happen.
My under-researched response was as follows, now moved from that thread to here:
That has been done, and I'd suggest, for a while it was a great idea. For best-in-class branch prediction, it's no longer helpful: these predictors can even pick up on patterns like 3 taken then the fourth time around not taken, IIRC.
But as we recapitulate history in our machines, it might well be that we re-visit this point where branch hints (or two types of branches - likely taken vs likely not-taken) are a useful idea. And in any case, always as interesting idea!
My model/sketch/fictional history of the evolution of branch prediction would run something like this
- no branches, just self-modifying code
- branches just work
- taken branches take a hit, untaken branches are cheapest
- backward branches are expected to be taken, forward branches not taken
- branch delay slot allows cheaper branching by hiding the penalty
- likely branches vs unlikely branches
- excellent branch prediction puts us back to plain branches (which need least encoding)
Edit: forgot to mention speculative execution: both taken and not-taken are executed for a while, until it's known which one should have been taken. Takes a lot more execution hardware and also control logic complexity.
Edit 2: OK that's not quite right: speculative execution is merely proceeding on the probable path, it's
eager execution which proceeds on both paths.