AnyCPU
http://anycpu.org/forum/

Modern high performance CPUs (with ref to Apple/ARM)
http://anycpu.org/forum/viewtopic.php?f=15&t=804
Page 1 of 1

Author:  BigEd [ Tue Nov 24, 2020 3:34 pm ]
Post subject:  Modern high performance CPUs (with ref to Apple/ARM)

.
I read a couple of good deep descriptions of the machinery in Apple's new M1 chip, an ARM (or AARCH64) implementation which seems to have great performance and insanely great performance per unit power. (I exaggerate.)

Apple's Humongous CPU Microarchitecture

A few quick takeaways:
- fast micros these days (since Alpha!) don't have single-cycle L1 cache - M1 as 3 cycles for L1D, which is best in class
- M1's Firestorm A14 cores have a really wide decode stage, helped by fixed length instructions
- there are just huge amounts of in-flight state

Apple's A14:
Attachment:
Apple-A14-Firestorm-AnandTech.png
Apple-A14-Firestorm-AnandTech.png [ 214.96 KiB | Viewed 4247 times ]


Another recent high-performance implementation:
Hot Chips 2020 Live Blog: IBM's POWER10 Processor on Samsung 7nm
Attachment:
File comment: IBM-POWER10
IBM-POWER10-AnandTech.png
IBM-POWER10-AnandTech.png [ 427.16 KiB | Viewed 4247 times ]


Related previous threads:

Author:  oldben [ Tue Nov 24, 2020 8:42 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Ibm has new standard for BCD math (2000?), for new 360 style mainframe computers ( I have no idea what they called today) for things like COBOL, or TAXES or BANKING. I don't see that feature listed with modern cpu's. Is that patented just for IBM, hidden as some special upgrade feature,
or they are so busy having a all singing and dancing cpu, they have never thought about that feature?

Author:  robfinch [ Tue Nov 24, 2020 9:01 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Quote:
Ibm has new standard for BCD math (2000?), for new 360 style mainframe computers ( I have no idea what they called today) for things like COBOL, or TAXES or BANKING. I don't see that feature listed with modern cpu's. Is that patented just for IBM, hidden as some special upgrade feature,
I wonder if it is considered part of the floating-point unit (decimal floating-point)

I think: yikes! on a branch miss a lot of cycles would be wasted for the humongous architecture.

Author:  BigEd [ Thu Dec 03, 2020 6:18 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

According to this overview
https://threadreaderapp.com/thread/1331 ... 03104.html
Apple have added some mechanisms to help their case: a mode with strong memory ordering to help with x86 emulation; something which speeds up reference counting which helps Swift programs; something which specifically helps with JavaScript.

Author:  robfinch [ Fri Dec 04, 2020 5:43 am ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

The mysterious “something”. I would think custom instructions in the AArch64 set could have compatibility issues with future ARMs.

I wonder what the percentage improvement in processing speed is compared to a smaller four-wide machine is and versus power consumption as well. For most apps it’s lucky if two instruction execute at the same time. I would think an eight wide machine would sit idle a lot of the time. With all the functional units, bypassing must be pretty large. Fixed length instructions probably really help the design, otherwise a lot of pipelining would be needed in the decode.

Author:  BigEd [ Fri Dec 04, 2020 10:34 am ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

> For most apps it’s lucky if two instruction execute at the same time
It does feel like that, from a coding perspective. But seeing how many machines go to such expense to execute more, it can't be so! Perhaps we need a nice graphical simulation of an out of order machine to see which instructions get dispatched, which get stalled, and which get retired.

(It is possible that boring everyday code doesn't have much instruction level parallelism to exploit, but highly optimised crucial routines do, and that's where performance really counts.)

Author:  joanlluch [ Sat Dec 05, 2020 12:29 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

It looks also plausible to me that Apple be heavily working on their own fork of the LLVM compiler to take the most advantage of their new A14 processors. From the point of view of developers, this should only take a recompile of their apps to get the benefits.

Author:  oldben [ Sun Dec 06, 2020 12:04 am ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

It hard to say how faster it will be. Best case timing is not average timing , and the memory cache
affects timing of the whole system. You need balanced system so all has fair share of memory.

Faster is realitive to the user. A better mouse click reponse affects more people than 5% increase in floating point division.

Author:  BigEd [ Sun Dec 06, 2020 4:14 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Good point that compiler improvements might bring further gains. (Similarly compiler related, I gather it's an advantage to the M1's emulation of x86 that modern x86 code tends to use a fairly regular subset of x86.)

I found a really nice presentation on the limits to ILP: taking an ideal CPU, seeing how much instruction level parallelism might ideally be extracted from an instruction stream, and then gradually refining the machine (and compiler) to more realistic conditions:
http://www.cse.uaa.alaska.edu/~afkjm/cs ... ations.pdf

Probably via this discussion, one of many about M1: https://news.ycombinator.com/item?id=25257932
where we see
Quote:
...for a long time people were saying that "CISC is just compression for RISC, making virtue of necessity", but it seems like M1 serves as a good counterexample where a simpler ISA is scaled up to modern transistor counts...
and
Quote:
I can't comment on the economics of it but I can comment on the technical difficulties. The issue for x86 cores is keeping the ROB fed with instructions - no point in building a huge OoO if you can't keep it fed with instructions.
Keeping the ROB full falls on the engineering of the front-end, and here is where CISC v RISC plays a role. The variable length of x86 has implications beyond decode. The BTB design becomes simpler with a RISC ISA since a branch can only lie in certain chunks in a fetched instruction cache line in a RISC design (not so in CISC). RISC also makes other aspects of BPU design simpler - but I digress. Bottom line, Intel and AMD might not have a large ROB due to inherent differences in the front-end which prevent larger size ROBs from being fed with instructions.


See also
Why do ARM chips have an instruction with Javascript in the name (FJCVTZS)?
which is probably the JS assist: one instruction replaces several, for an operation commonly needed by JS engines.

Author:  MichaelM [ Mon Dec 07, 2020 2:36 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

BigEd wrote:
(Similarly compiler related, I gather it's an advantage to the M1's emulation of x86 that modern x86 code tends to use a fairly regular subset of x86.)

Interesting that you brought that up. When I was looking to port a compiler to my M65C02A soft-core, I did a quick tally of the x86 instructions that the compiler used. As you suggest, for that Pascal compiler, the list shown is below is remarkably short compared to the total number of instructions that the x86 processor itself supports.
Code:
    mov dst,src
    rep movsb
    lea dst,src
    cmp dst,src
    repe cmpsb
    push src
    pop dst
    not dst
    and dst,src
    or dst,src
    add dst,src
    sub dst,src
    imul src
    idiv src
    call dst
    ret n
    jmp dst
    jl dst
    jle dst
    je dst
    jne dst
    jg dst
    jge dst

Author:  oldben [ Mon Dec 07, 2020 10:15 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Confused here. I have not followed apple, but just what are we emulating that needs
x86 codes?

Author:  BigEd [ Mon Dec 07, 2020 10:40 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Apple is in the process of changing their consumer computers from x86 to ARM, and to provide some backward compatibility have an ahead-of-time translation from x86 to ARM. (Also some just-in-time capability I think.) So, with this technology, which they call Rosetta2, users of the new computers can run older software which hasn't yet been ported to ARM. It turns out the performance isn't too bad, which is notable.

Author:  oldben [ Mon Dec 07, 2020 11:12 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Apple proves what "cheap is" umm best CPU, works for computers.
I still remember Dr Dobbs and putting 512KB on a MAC. Ben.

Author:  joanlluch [ Mon Dec 07, 2020 11:14 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

There's little doubt that Apple will provide a smooth transition. In the past they already switched from 68000 to PowerPC and then to Intel. I recall the times where "Rosseta" executed 68000 code in PowerPC macs and you didn't even know unless you looked at the system monitor. IIRC, the technology consists on real time, just-in-time, conversion of target machine code into native code in blocks as the execution progresses, in a way that is transparent to the user and is only performed a single time for any given piece of machine code. So it's not like a machine code interpreter, but a true machine code translator, this is why it is so fast. I also believe that translation from Intel instructions to ARM code is potentially a lot more efficient than 68000 to PowerPC, because ARM instructions and addressing modes are much more alike to Intel's than PowerPC ever was to 68000. [Time to buy a "short" position in Intel stock...]

Author:  BigEd [ Tue Dec 08, 2020 12:49 pm ]
Post subject:  Re: Modern high performance CPUs (with ref to Apple/ARM)

Nice graph in here of Apple's ARM experience: they've been making fully custom ARMs for 6 years now.
https://www.cs.utexas.edu/~bornholt/post/z3-iphone.html

Also a piece here (from 2018) about cache latencies over various generations:
https://www.anandtech.com/show/13392/th ... -secrets/3

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/