Last visit was: Fri Sep 17, 2021 10:05 pm
It is currently Fri Sep 17, 2021 10:05 pm



 [ 27 posts ]  Go to page Previous  1, 2
 Thoughts about a high-density ISA 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1632
(I suspect it was never going to be high density, but I've just learned that Intel's 432 had arbitrary length instructions up to 64k bits.
https://en.wikipedia.org/wiki/Intel_iAP ... ion_format
)


Sun Feb 07, 2021 5:52 pm

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 205
Location: Huntsville, AL
The only architecture that I am aware of that used a bit-oriented instruction pointer was the Burroughs B1700/B1800 computers whose purpose was to implement "interpreters" for foreign instruction sets.

Using a bit-oriented instruction packing scheme may require the least significant bits of the instruction pointer to be dedicated to pointing at the starting bit of the target instruction. That would naturally decrease the byte-addressable memory range a fixed length instruction pointer could address unless it was correspondingly increased in width.

_________________
Michael A.


Sun Feb 07, 2021 7:40 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Here's an idea: Segmented memory.

Every block of code that can be jumped to is in its own segment, and only segments are valid jump targets. Inside a segment, instructions can be compressed using a state machine.

This state machine's state is preserved on interrupt, but if the interrupt changes the segment it clears out that state machine. Jumps also clear the state machine. Calls would need to work like a jump and clear the state machine, which means a return would have to either return to an arbitrary segment specified in the call, or return to the segment after the calling segment.

So each segment would have to be independently compressed. But inside a segment, decompression can depend on previous bytes because there's no way to jump into the middle of a segment.

Then you can pick pretty much any compression scheme that can be easily implemented in hardware and gives really good compression at a block-by-block level. And you don't have the overhead of bit-addressed program memory. I am going to guess it will be some compression algorithm that you can have a shared dictionary for.

But then you have the overhead of a memory manager that has to know how to pack segments into memory efficiently, and do the segment translations. But I mean, if the goal is to set a world record for smallest ISA, having tons of hardware to help with that goal might be worth it?

There's also the fact that programmers hate segmented memory, but I wonder if the problem is mainly the 286 implementation of it was bad, not the concept itself?

edit: and only the program memory would need to be segmented, the data memory could be linear/paged like "normal"


Tue Feb 09, 2021 12:21 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1632
There could be something in this! But note that branches are very common in 8 bit code. But then again, perhaps with an explicit looping instruction we don't need to place loop start or end on a segment boundary.

It's an interesting (perhaps shocking) thought that jumping from one segment to another could cost quite a few cycles in translations or mappings.

Are you thinking segments are as small as basic blocks, or as large as procedures?

I once worked in a team making a CPU that split jumps into two instructions: the first prepares the destination address and the second does the transfer. One can perhaps imagine a small cache or stack of destination addresses which can be referred to by slot number.


Tue Feb 09, 2021 1:53 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Yes, the idea would be that segments are the size of basic blocks. I suppose the larger the block the better the compression. But if you have a good compression algorithm, I am not sure how you would make it so you can jump into the middle of a compression block.


Tue Feb 09, 2021 1:58 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1632
I suppose one answer is a cache for decompressed code - not totally different perhaps from the idea of a cache for micro ops in a CISC machine.

So it becomes a bit like a compressing memory interface: 96 instructions fits into 64 bytes or words. Except of course there will be variable amounts of compression.


Tue Feb 09, 2021 2:46 pm

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 5
Possibly a little late to this but I can think of 2 answers to this and both (I suspect) are closely related. This first is a virtual one, but one that's currently in-use (or was regularly in-use) until recently and that is CINTCODE which is the default target of the current BCPL compiler by Martin Richards. It stands for Compact INTermediate CODE. It's a byte-code in a machine that has 3 main registers plus program counter, stack pointer and a globals pointer. The registers work as a 2-deep stack with the 3rd register being used for data byte addressing. Opcodes and operands are not word aligned, but data often is. The operands are variable length and opcodes were chosen based on analysis of code, so (AIUI) the cycle was something like

Invent some opcodes
Write a compiler
analyse the output
Improve the opcodes
repeat

So there are a number of one-byte opcodes for loading small constants - ie 0 to 10, -1. Or loading a variable from a small stack offset, or some arithmetic - imagine a FOR type loop - how often do you increment by anything other than 1? And so on. There are even 2 instructions to deal with the high level language SWITCH/CASE commands.

So this is obviously domain-specific but I have tried some trivial experiments to compile something else into CINTCODE and it's good enough for my own tests - for now. It was originally a 16-bit VM, but was expanded to 32 and now 64-bit without too much issues. (So there is a load small constant - 1 byte, load byte - 2 bytes, load halfword - 3 bytes and load word - 5 bytes and so on. Obviously it's up to the compiler to output the best instruction for this - same for accessing local/stack variables - the compiler would sort the variables by reference count and use a single byte load/store for the most frequently used ones and so on. The current ISA has 255 instructions including a NOP and BRK.


The other one I can think of has already been mentioned - and was a real implementation in microcode and there are similarities to CINTCODE and that is the Inmos Transputer. 3 registers which worked in a similar stack and variable length instructions working in a similar manner to the above. As well as OCCAM, compilers for C and FORTRAN were created for the Transputer - which really was a good, general purpose 16 or 32-bit microprocessor with a few special quirks (high speed communication links and multi-threading in hardware) at the end of the day.

My own interest in this is in CINTCODE and BCPL - and I've recently made my own implementation of CINTCODE as part of a retro-computer project I'm playing with - it runs on the WDC65c816 CPU but I'd love to make a "real" CPU that executed CINTCODE natively one day... (Obviously FPGA/CPLD, etc. however that's real enough for me!)

Cheers,

-Gordon


Fri Feb 12, 2021 10:20 pm

Joined: Mon Aug 14, 2017 8:23 am
Posts: 157
rj45 wrote:
Here's an idea: Segmented memory.


I was thinking about something similar based on an idea from Marcel's Gigatron.

He wrote an interpretive language vCPU, with about 30 primitive instructions and coded them into the bottom few pages of the 64K x 16 ROM.

Every instruction had to be fewer than 32 clock cycles, in order to fit into the horizontal blanking of the video. These primitive instructions were written in the 8-bit native assembly language of the Harvard Gigatron, but provided the means to creat a 16-bit von-Neumann virtual machine - massively extending the scope and usefulness of the Gigatron.

At the time of writing the first version of the ROM, instructions were between 14 and 28 clock cycles long - the longest instruction was 28 instructions to implement a 16-bit addition. All other vCPU instructions were shorter.

As ROM is cheap, it would be sensible to put the instructions at 32 word boundaries, which simplifies the task of decoding the opcodes for the vCPU instructions. Any vCPU instruction taking less than the full 32 locations is terminated early with a return to the interpreter.

The code in ROM is almost the equivalent of microcode.


Mon Feb 15, 2021 11:38 pm

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 5
monsonite wrote:
rj45 wrote:
Here's an idea: Segmented memory.


I was thinking about something similar based on an idea from Marcel's Gigatron.

He wrote an interpretive language vCPU, with about 30 primitive instructions and coded them into the bottom few pages of the 64K x 16 ROM.

Every instruction had to be fewer than 32 clock cycles, in order to fit into the horizontal blanking of the video. These primitive instructions were written in the 8-bit native assembly language of the Harvard Gigatron, but provided the means to creat a 16-bit von-Neumann virtual machine - massively extending the scope and usefulness of the Gigatron.

At the time of writing the first version of the ROM, instructions were between 14 and 28 clock cycles long - the longest instruction was 28 instructions to implement a 16-bit addition. All other vCPU instructions were shorter.

As ROM is cheap, it would be sensible to put the instructions at 32 word boundaries, which simplifies the task of decoding the opcodes for the vCPU instructions. Any vCPU instruction taking less than the full 32 locations is terminated early with a return to the interpreter.

The code in ROM is almost the equivalent of microcode.


That's how I'm treating my CINTCODE interpreter. I'd love to have the ability to build a real (microcoded) cpu that runs it natively. I'm sure it's possible, given the similarities with the (microcoded) Transputer, so maybe one day.

My VM suffers from inefficiencies from running it on a hybrid 8/16-bit CPU (The 65816) which has a segmented memory system. working round it's banks of 64KB to give the system a linear address space adds extra cycles, as do the 8/16 bit switches, when needed. On my 16Mhz '816 the VM runs at an equivalent of about 500Khz at best and 100Khz at worst (not counting things like MUL, DIV, MOD which are slower again)

Cheers,

-Gordon


Tue Feb 16, 2021 9:04 am

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 71
monsonite wrote:
He wrote an interpretive language vCPU, with about 30 primitive instructions and coded them into the bottom few pages of the 64K x 16 ROM.


ZPU does something similar - some of its instructions are optional and may or may not be implemented in hardware. Those that aren't are trapped and emulated with "microcode" stored in the first page of the ROM.

Quote:
As ROM is cheap, it would be sensible to put the instructions at 32 word boundaries, which simplifies the task of decoding the opcodes for the vCPU instructions. Any vCPU instruction taking less than the full 32 locations is terminated early with a return to the interpreter.


Again, that's how ZPU does it - the emulation code begins on 32-byte boundaries - any emulation code which is larger than 32 bytes simply branches to a static subroutine which follows the vectored code.


Tue Feb 16, 2021 11:57 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1632
Some thoughts: there are a couple of angles, one being to make opcodes especially useful, so you need fewer of them, and the other to make especially efficient encodings, so you use fewer bytes.

I'm reminded of the idea from early RISC days - this might be Steve Furber actually - that for a given transistor budget, you can spend on heaps of microcode to implement complex instructions, or spend it on lots of cache which in effect kind of sort of gives you program-specific on-chip microcode. Which also reminds me, of those microcoded machines with a writeable control store, so you really could have domain-specific opcodes. (I used such a program, on a VAX, which did fault simulation by implementing a sort of SIMD approach.)

And for sure, a simple machine which runs a slightly more complex virtual machine is another point in this space.


Tue Feb 16, 2021 3:18 pm

Joined: Mon Oct 07, 2019 2:41 am
Posts: 256
Xerox Alto put a large amount of code in micro-code. Not sure if it saved space, but it did save time on I/O and Bit mapped graphics. (B&W). Need a new Lanuage, write some micro-code and your virtual machine is ported.

Ben.


Wed Feb 17, 2021 2:57 am
 [ 27 posts ]  Go to page Previous  1, 2

Who is online

Users browsing this forum: AhrefsBot, CCBot and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software