View unanswered posts | View active topics It is currently Thu Mar 28, 2024 3:30 pm



Reply to topic  [ 9 posts ] 
 on branch prediction 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
In a nearby thread we started talking about branch prediction - see below for quotes.

It turns out the Wikipedia page is quite good, and in it I found this article (long PDF - see page 12 onwards) surveying modern branch prediction and these two interesting historical snippets - self-modifying code isn't the half of it, this is machine-modified code:

Quote:
The Burroughs B4900, a microprogrammed COBOL machine released around 1982, was pipelined and used branch prediction. The B4900 branch prediction history state is stored back into the in-memory instructions during program execution. The B4900 implements 4-state branch prediction by using 4 semantically equivalent branch opcodes to represent each branch operator type. The opcode used indicated the history of that particular branch instruction. If the hardware determines that the branch prediction state of a particular branch needs to be updated, it rewrites the opcode with the semantically equivalent opcode that hinted the proper history. This scheme obtains a 93% hit rate. US patent 4,435,756 and others were granted on this scheme.


Quote:
The IBM 7030 Stretch, designed in the late 1950s, pre-executes all unconditional branches and any conditional branches that depended on the index registers. For other conditional branches, the first two production models implemented predict untaken; subsequent models were changed to implement predictions based on the current values of the indicator bits (corresponding to today's condition codes). The Stretch designers had considered static hint bits in the branch instructions early in the project but decided against them. Misprediction recovery was provided by the lookahead unit on Stretch, and part of Stretch's reputation for less-than-stellar performance was blamed on the time required for misprediction recovery. Subsequent IBM large computer designs did not use branch prediction with speculative execution until the IBM 3090 in 1985.


Garth wrote:
robfinch wrote:
I finally understood how the g-share branch predictor could be implemented.
The core uses a PowerPC style branch predictor arrangement. The branch address is fed for instruction fetch from the branch target buffer in the first cycle of a branch instruction fetch. During the next cycle an adaptive correlating branch predictor may override the address from the BTB.

I have not paid much attention to matters of processor design; but I have wondered why not let the programmer tell it if it's likely to branch or not. He always knows the goal of the particular part of the code, and what it's used for, unlike even the smartest compilers. Then the processor could use information from the programmer to prepare the most efficient way to do what he says will usually happen.


My under-researched response was as follows, now moved from that thread to here:

That has been done, and I'd suggest, for a while it was a great idea. For best-in-class branch prediction, it's no longer helpful: these predictors can even pick up on patterns like 3 taken then the fourth time around not taken, IIRC.

But as we recapitulate history in our machines, it might well be that we re-visit this point where branch hints (or two types of branches - likely taken vs likely not-taken) are a useful idea. And in any case, always as interesting idea!

My model/sketch/fictional history of the evolution of branch prediction would run something like this
- no branches, just self-modifying code
- branches just work
- taken branches take a hit, untaken branches are cheapest
- backward branches are expected to be taken, forward branches not taken
- branch delay slot allows cheaper branching by hiding the penalty
- likely branches vs unlikely branches
- excellent branch prediction puts us back to plain branches (which need least encoding)

Edit: forgot to mention speculative execution: both taken and not-taken are executed for a while, until it's known which one should have been taken. Takes a lot more execution hardware and also control logic complexity.

Edit 2: OK that's not quite right: speculative execution is merely proceeding on the probable path, it's eager execution which proceeds on both paths.


Wed Jun 05, 2019 7:59 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
but I have wondered why not let the programmer tell it if it's likely to branch or not.

I think it may be in part due to a lack of representing branch predictions in a high-level language. I had this in CC64 where an extra prediction constant could be passed in an 'if' statement. If statements resolving down to branches eventually.

_________________
Robert Finch http://www.finitron.ca


Thu Jun 06, 2019 4:35 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I was just adding static branch prediction to the rtfItanium and I noticed it requires multiplexors on the input and output of the branch predictor. I wonder if this extra hardware is the reason why static branch predictions are not present on some processors. The extra hardware might be requiring more time to process.

_________________
Robert Finch http://www.finitron.ca


Thu Jun 06, 2019 5:21 am
Profile WWW

Joined: Sat Oct 28, 2017 12:16 am
Posts: 7
I found this article on the subject pretty good, it was linked on hacker news some time ago.

Regarding prediction hinting, there seems to be some support in compilers (GCC at least), see for example here .


Thu Jun 06, 2019 10:00 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
In cc64 it’s (or rather was depending on the version) just a one or a zero in the ‘if’ statement.
Code:
if (x==10; 1)   // predict always true
I was thinking of extending this to allow the branch predictor to be chosen, if there were multiple predictors.
If the semi-colon + value is left out, then it’s left up to the processor to predict.

I think the semantics are harder to follow the way it’s implemented in GCC, but that’s just me. I ‘m not fond of the word ‘likely’.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 07, 2019 4:36 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
hmn wrote:
I found this article on the subject pretty good, it was linked on hacker news some time ago.

Oh, thanks, that is a good article/talk! (Here's the discussion on HN) And with references to real-world CPUs using the increasingly sophisticated tactics:

Quote:
Despite being simple, this works quite well, and we might expect to see something like 90% accuracy for a two bit predictor, which gives us a cost of 1.38 cycles per instruction.

Image

One natural thing to do would be to generalize the scheme to an n-bit saturating counter, but it turns out that adding more bits has a relatively small effect on accuracy.


Fri Jun 07, 2019 7:23 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I wonder if doing something bizarre like factoring in the time-of-day would help with branch prediction. Many systems run the same software at specific times of day.

_________________
Robert Finch http://www.finitron.ca


Sat Jun 08, 2019 3:35 am
Profile WWW
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
robfinch wrote:
I wonder if doing something bizarre like factoring in the time-of-day would help with branch prediction. Many systems run the same software at specific times of day.
Lol, that's a good one. But I guess most systems that run at "specific times" don't even run compiled languages nowadays. That's an unfortunate trend... So at most, branch prediction benefits the interpreter, not as much the actual application code, which obviously runs much slowly on top of the interpreter.

Of course, another benefit is trying to avoid as many branches as possible before the code is even set for execution. Modern compilers really work hard for that goal, even at the expense of creating much larger code, for example by copying blocks of code instead of branching. Architectures with predicated instructions like the ARM also help to reduce the number of branches. Another branch reduction approach, that I chose my cpu74 implementation, is the incorporation of conditional 'set' and 'select' instructions. To some extent they act as predicated instructions, and they are able to replace a lot of branching in situations where branching is particularly expensive.


Sat Jun 08, 2019 3:31 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
A recent article on this topic shows the difficulty of benchmarking code on modern processors - the branch prediction gets better each time you run the test, if you don't take care to have enough data:
https://lemire.me/blog/2019/10/16/bench ... -branches/


Tue Dec 03, 2019 9:56 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 9 posts ] 

Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software