View unanswered posts | View active topics It is currently Thu Mar 28, 2024 10:06 pm



Reply to topic  [ 6 posts ] 
 ns32k and swordfish - VAX-like instructions on a RISC core 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Following on from here:
joanlluch wrote:
BigEd wrote:
Having said which, it's not just x86. I was told by someone who ought to know that the front end of a high performance ARM is just as expensive as an x86 one. I found it hard to believe but there you are. It turns out high performance ARMs might also use micro-ops internally, for similar reasons:
https://www.quora.com/Why-do-ARM-proces ... operations

Wow that’s really interesting. This means that when we enter to the field of high performance processors, their native instruction encodings doesn’t really matter. Which makes me dream on the return of an instruction set as beautiful as the Digital VAX-11, with fully regular, compact, variable size encodings, allowing any arbitrary combination of addressing modes and operations in a single instruction, and both compiler and human friendly! . Imagine that instruction set being processed as micro-ops by an imaginary cpu with the complexity of the x86, achieving same or better performance... wouldn’t that be delightful?
http://bitsavers.trailing-edge.com/pdf/dec/vax/archSpec/EY-3459E-DP_VAX_Architecture_Reference_Manual_1987.pdf
(How this was allowed to die in favour of the x86?, well, I know, so that’s just a rhetorical question)


It's often said that NatSemi's ns32k offers a VAX-like instruction set. (It was unsuccessful in part because it took very many iterations to get right - up to revision H at least - and so it could be described either as late or as buggy, or both. It's a hidden cost of CISC that verification is really difficult. Indeed, the version you can run today as a second processor on a BBC Micro, emulated on a Pi, took a fair bit of debugging, and may or may not still have bugs, but they are not yet found.)

But after the original series of revisions, there was a new implementation, with interesting characteristics. Perhaps it was VLIW on the inside, and presented itself either as running a RISCy instruction set or a most-of-the-ns32k instruction set (with the exceptions trapped and emulated.)

See the following takes:
http://www.cpushack.com/CPU/cpu3.html#320xx

Quote:
Elegance and regular design was a main goal of this processor, as well as completeness. It was similar to the 68000 in basic features, such as byte addressing, 24-bit address bus in the first version, memory to memory instructions, and so on (The 320xx also includes a string and array instruction). Unlike the 68000, the 320xx had eight instead of sixteen 32-bit registers, and they were all general purpose, not split into data and address registers. There was also a useful scaled-index addressing mode, and unlike other CPUs of the time, only a few operations affected the condition codes (as in more modern CPUs).
...
The Swordfish implemented the NS32K instruction set using a reduced instruction core - NS32K instructions were translated by the cache decoder into either: one internal instruction, a pair of internal instructions in the cache, or a partially decoded NS32K instruction which would be fully decoded into internal instructions after being fetched by the CPU. The Swordfish also had dynamic bus resizing (8, 16, 32, or 64 bits, allowing 2 instructions to be fetched at once) and clock doubling, 2 DMA channels, and in circuit emulation (ICE) support for debugging.

The Swordfish was later simplified into a load-store design and used to implement an instruction set called CompactRISC (also known as Pirhana, an implementation independent instruction set supporting designs from 8 to 64 bits). CompactRISC has been implemented in three stage, 16-bit (CR16A), 20-bit (CR16B), and 32-bit (CR32A) address versions (CR16B also included bit-oriented memory operations).



https://people.cs.clemson.edu/~mark/swordfish.html

Quote:
The Swordfish is a unique design with a superscalar external appearance but a long-instruction-word (LIW) internal microarchitecture based on a decoded instruction cache (DINC).



https://virtuallyfun.com/category/ns32032/

Quote:
Enter the Definicon DSI-32 co-processor card. It’s a simple 8bit ISA card containing the NS32032 processor, some memory an sockets for both a math co-processor, and a MMU. The NS32032 is also somewhat infamous as being rather ‘VAX like’, and being difficult for compilers of the era to properly optimize for.

Steve Furber says:
Quote:
We’d been out to Israel to visit National Semiconductor’s design centre there and the 16032 team were on Rev H of the 16032. They started at Rev A, B, C, D. So, you know, they’d clearly got it wrong several times. And they were still debugging this thing ‘cause it was so complex and they had huge teams that were way beyond Acorn’s means.


Wed Jan 08, 2020 9:37 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Thanks for sharing this, I didn't know about the ns32K. It's a pity it failed, but I can understand why it was eventually surpassed in sales and performance by the Motorola 68K series, with was less ambitious with regard to instruction set orthogonality, but getting a much better balance in a very nice processor. With the technology of the time, I think that contemporary RISC processors should easily outperform both (I think on Digital Alpha processors). The weird thing is that the 8086 eventually evolved to the super-scalar processor it is nowadays. I suppose that a VAX-like instruction set was too much at the time to handle into a single chip and get any real performance out of it: too many addressing modes and combinations of operators, too many data types (the instruction set even supported 128 bit integers and floats!) , too many different instruction sizes, too many complex instructions, and so on. After all, the original VAX-11 used many cycles to execute instructions. So It's probably only nowadays that the technology is there to maybe embed it all in a complex microcoded processor and regain the full glory of that iconic instruction set at speeds never attainable before.


Fri Jan 10, 2020 10:42 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I think it’s certainly possible to implement a VAX instruction set compatible processor using RISC-like instructions. What about a project to implement a 64-bit processor with backwards compatibility to the VAX?

I did a quick Google on “FPGA VAX” to try and find such a project and there appears to be a couple of commercial / industrial projects to implement a VAX in an FPGA, but I don’t see any hobby projects. I was rather hoping to find something I could dissect. It’s probably too big and complex. I like the VAX architecture, but it is a bit dated. There isn’t a large enough address space and not enough registers. Also having the PC as part of the general register file is a bit passe. Some of the string operations would need to be updated for modern character sets >= 16 bits. It looks like it would be challenging to implement.

_________________
Robert Finch http://www.finitron.ca


Sat Jan 11, 2020 4:16 pm
Profile WWW
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Hi Rob, Just to make me clearer:

I didn't meant to produce a VAX-11 compatible processor, but one with an instruction set architecture highly inspired on it. Note that I mentioned "VAX-like" instruction set, not a VAX-compatible one or even using the same encodings.

What I like of the VAX architecture is that instruction opcodes are encoded in one or two bytes (generally just one), followed by zero or more bytes indicating the addressing modes in all possible combinations and data types. This produces a totally flexible, fully expandable, very compact, instruction set with virtually no encoding limitations. That's it.

However, I do not advocate for a full compatible version of the original VAX because there's more than the problems that you describe. Another problem is that the VAX would set Status Flags for more instructions than necessary, thus complicating things for compilers and preventing many compiler optimisations.

I hope this makes my dream clearer now.


Sat Jan 11, 2020 5:02 pm
Profile

Joined: Mon Dec 28, 2015 11:37 am
Posts: 13
BigEd wrote:
Perhaps it was VLIW on the inside, and presented itself either as running a RISCy instruction set or a most-of-the-ns32k instruction set (with the exceptions trapped and emulated.)

I was looking for some info on Swordfish's predecessor, the NS32764, some weeks ago and stumbled upon a article written by one of it's designers, Don Alpert. Didn't find the article I was reading then but found a page containing a quote about the VLIW influence on the core.

Swordfish was most strongly influenced by:
- MIPS-X at Stanford. We followed a similar integer pipeline and looked at their branch handling as well. I visited Stanford in summer 1987(?) and was exposed to the work in detail.
- Multiflow VLIW. I had met Josh Fisher once when I was a student at Stanford, then heard him give a talk about Multiflow at UC Berkeley in 1987 (?). We were trying to figure out how to get parallelism out of multiple functional units, and adopted a microarchitecture that was like VLIW: each FU was assigned to fixed slots in a 2-wide instruction word fetched from the cache. We had the HW detect dependencies as instructions were placed in the cache slots, so it was a superscalar architecture with a VLIW machine organization. To improve icache efficiency we allowed dependent instructions to be packed together with a bit per pair of instructions that indicated whether or not they were dependent. Independent instructions could be executed in parallel, dependent instructions had to be executed sequentially, but still on the pipeline assigned to that slot. Just about the only wasted cache slots were for FP instructions that could not be paired with a load or integer op.

http://www.cpu-ns32k.net/Swordfish.html

Edit: Eh, that quote was already in links from Ed's post.


Thu Feb 13, 2020 10:49 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Ah, thanks, that's a hint to search for Donald Alpert's writings. He runs a consultancy, and on his website we read:

Quote:
Unfortunately, the design and validation techniques used by NSC and other chip makers were inadequate to debug a complete computer system.

For the next generation NS32132 CPU NSC had developed much more effective validation techniques to create a very reliable product, but the market opportunity had passed for workstations.


and

Quote:
The Swordfish microprocessor was developed to meet the requirements of NSC’s workstation and server customers, as well as for laser printers and other embedded applications. These sometimes conflicting objectives were met by implementing a stripped-down subset of the NS32000 architecture directly in hardware and emulating more complex operations with an on-chip ROM. The result was a single-chip microprocessor that was possibly the first to integrate superscalar integer pipelines, pipelined double-precision floating point pipeline, memory management, and cache.


Lots of links to papers too!

And there's more nearby, for example a 1998 presentation on The Future of Microprocessor Architecture

I also found an earlier shortish paper by Alpert in which he concludes that VISP is the future: virtual instruction set processors:


Fri Feb 14, 2020 7:45 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 6 posts ] 

Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software