Author |
Message |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1799
|
. I found a couple of good papers on the Cray-1. They chose a fast technology, built almost the whole machine from one type of logic gate, controlled and equalised all the wire lengths, and built the machine in cylindrical form to minimise distance. And then they had to cool it. "The 12.5 ns clock is divided into eight “gate times” of about 1.5 ns each. Roughly half the gate time is due to circuit propagation delay, and half is due to board-foil delay." Two papers for those interested:
|
Fri Nov 20, 2020 9:09 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2187 Location: Canada
|
I remember seeing a CRAY at the University of Toronto many years ago. It was quite a machine. It is amazing that a supercomputer from just 40 years ago is probably beat by the average laptop.
_________________Robert Finch http://www.finitron.ca
|
Fri Nov 20, 2020 2:43 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1799
|
"The CRAY-1 was rated at 160 MIPS (and 5.5 tons) - The Raspberry Pi -C is 2441 MIPS and 42 grams" (according to someone on the internet!)
|
Fri Nov 20, 2020 4:11 pm |
|
|
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 649
|
Scale the cray to todays gate speeds, I am sure we have the speedyist machine. New chips seem to be getting bigger core wise, so sending signals are getting slower than gate speeds. Ben.
|
Fri Nov 20, 2020 8:46 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2187 Location: Canada
|
I get the impression that wire delay is relatively larger compared to gate delays. I think that means that more gates can be used with the same wire balance. The eight gate times of the CRAY maybe can be increased leading to a slightly different architecture? IIRC some newer designs use 12 or 16 gate delays. In the future it may be possible to use more gates per clock if wire delays increase? The 5/6 input nor gates should map nicely to the six input LUTs in an FPGA. I would think it should be possible to map gate by gate if creating an FPGA version.
I took a look at the CRAY-1 on a Spartan3e I think on the web. I think a CRAY-4 (architected for newer tech) on an FPGA would be interesting.
_________________Robert Finch http://www.finitron.ca
|
Sat Nov 21, 2020 10:49 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1799
|
. There's some commentary on the Cray-1 and more modern CPUs in the perhaps provocatively titled The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays [2002] by M.S. Hrishikesh; N.P. Jouppi; K.I. Farkas; D. Burger; S.W. Keckler; P. Shivakumar There's a slide deck which draws on the same material here. Quote: 6FO4 is the amount of useful logic per stage that will provide the best performance - Below this value, the improvement in clock frequency cannot compensate for a decrease in IPC and vice-versa - Optimal clock frequencies are dependent on on-chip microarchitectural structures, these structures need to be pipelined to operate at high frequencies
(Not to say that I necessarily agree with the ideas within - above my pay grade)
|
Sat Nov 21, 2020 11:22 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2187 Location: Canada
|
A lot above my pay grade too. I wonder if the following remains true (the article is dated 2002). I’ve read elsewhere on the web that wire delay is increasing relative to gate delay (I wonder if I misinterpreted something) at the smaller sizes and note in the article:
“While we did not consider the effects of slower wires,”
“To first order, wire delays remain constant as a fixed design is scaled to smaller feature sizes [15]. Although wire resistance increases, wire lengths decrease, thus preserving the absolute wire delay across technologies.”
_________________Robert Finch http://www.finitron.ca
|
Sat Nov 21, 2020 1:12 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1799
|
It's a good question. I think in 2002 the idea of wire delays becoming more important was becoming current. As I recall, one of the consistent ideas of the EDA companies was that their tools were happy digesting entire chip designs and globally and locally optimising everything. One of the consistent ideas in the places I worked was that such an approach would not produce high performance CPUs, so we didn't believe it and we didn't try it. By building chips and CPUs from smaller parts and paying attention to placement and some pre-routing, we possibly put ourselves into a different balance of logic vs wire.
In other words, something might be true for some design flows and false for others.
|
Sat Nov 21, 2020 4:39 pm |
|
|
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 649
|
Alas nobody can afford real chips to try out the ideas. I get the feeling it is still hand lay out that gets you the speed. Ben.
|
Sat Nov 21, 2020 7:59 pm |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
I remember seeing the Cray-1a (Serial Number 011) at the Science Museum, London many years ago.
As well as being an iconic design from the late 1970's, and very expensive office furniture - it was the hand assembled backplane wiring that was most impressive. I thought to myself - how can this rats-nest possibly work!
I wonder what the world's collective computer processing resources now accumulate to - if measured in units of 1 Cray.
I think the ostentatious series of Cray machines may have had a little influence on Douglas Adams.
|
Sun Nov 22, 2020 7:38 pm |
|
|
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 649
|
Not a Rats nest? I suspect The Hitchhiker's Guide to the Galaxy mice , lived there instead for that model. Adams seemed to write every thing from Dr Who to Radio Plays. Now back to Crays.
|
Sun Nov 22, 2020 8:05 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1799
|
You can get up close and personal to (most of) a Cray-1S/2000 at The National Museum of Computing (on the Bletchley Park campus.) Some photos here: https://photos.app.goo.gl/UYeoq9aMxVcfUMeX6
|
Sun Nov 22, 2020 9:26 pm |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
The 80MHz clock frequency, and 160 MFLOPS, whilst non-impressive today, was not it's only secret weapon.
I was informed that it had a huge memory bandwidth on account of its vector processing architecture so that vast quantities of data could be processed in parallel.
For 10 million 1976 dollars - I'd want more than a modern laptop........
|
Mon Jan 18, 2021 7:54 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1799
|
Yes, as I recall, wide memory with lots of interleaving. One way to look at it is that the vector business is a way of making sure you can launch lots of memory requests and do useful work as they complete.
The price is interesting too - no relation to cost of making it, but instead a statement that if you want the fastest machine in the world, this is what you pay.
|
Mon Jan 18, 2021 8:21 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2187 Location: Canada
|
Cost 10 million in 1976 using 115kW power, available today in an FPGA, board cost 1 thousand using 5W.
I was just reading up on the Cray-1 memory organization. It should be possible to come up with a good facsimile of Cray-1 on an Artix7 FPGA although with some memory size limitations. But then why stop at a Cray-1? More is known about computing today.
These stats taken from a slide by Portland State University Microsoft PowerPoint - cray-1-t3e [Compatibility Mode] (pdx.edu) 1M words of memory. 16 banks of 64kB each. 4 clock period bank cycle time (20 MHz). Transfer 1 word per cycle to B, T, and V registers 1 word every other cycle to A and S registers 4 word every cycle to instruction buffers.
My Comment: The instruction buffers act a lot like 4-way set associative cache with a 256-bit line size. The stats look “easy” to achieve in an FPGA.
_________________Robert Finch http://www.finitron.ca
|
Wed Jan 20, 2021 2:38 am |
|