View unanswered posts | View active topics It is currently Thu Mar 28, 2024 11:23 am



Reply to topic  [ 20 posts ]  Go to page 1, 2  Next
 24-bit word-oriented computing, some thoughts 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Revaldinho and I have been kicking around some ideas for a possible OPC24 - that is, a one-page-computing machine with a 24 bit wide databus. We thought it might be good to share our thoughts and see what other ideas and responses people have.

The starting point is Ferranti's early 16 bit microprocessor, the F100-L, which we're currently researching, but only so far as it led us to look into their 24 bit minicomputer architecture, the FM1600. This was rather long-lived, from the 60's to the 80's at least, in part because it was reimplemented in several successive technologies and in part because it was designed into various military products, which then have a long service life. But also, surely, because it's a wonderful and marvellous architecture.

But we probably can't write an FM1600 in one page of code, so the question becomes, what can we design by way of a 24 bit word-oriented CPU?
  • It might be that 24 bits is wide enough to have fixed length single word instructions.
  • Viewed as an upgrade from our 16 bit ideas, we could double the size of the register file.
  • Viewed as a downgrade from our 32 bit ideas, we could remove predication and halve the size of the register file.
  • We could have a small register file and have three addresses in each instruction, or a larger one and stick with a two address machine.
  • The address bus could be 24 bits to match the databus, or 16 bits to match the FM1600

As ever, there's the question of addresses and constants. A machine with fixed length single word instructions can't embody a full width constant in an instruction.
  • We can have an optional operand word for long values
  • We can have a load top half and load bottom half instruction
  • We can just say all long values must be found in data memory
The third option is quite interesting. For someone familiar with the 6502, where the short addresses only reach 128 address-sized objects, it would be very radical to remove all absolute addresses and all three byte instructions. But if the direct page is as large as 4k words, which is only a 12 bit address and still only half our instruction, then it's not so radical to say all constants and all subroutine addresses can be found in direct page. (One could also indirect into a table in memory such that it's only the table's base address which needs to live in direct page.)

Looking then at our catalogue of OPC machines, we have the following machines to build on:
  • OPC-5 - 16 bits, single fixed format, optional operand, predication, 16 registers (including 0 and PC).
  • OPC-5ls - as OPC-5 but a load-store machine, has 18 instructions instead of 8 instructions in two modes.
  • OPC-6 - as OPC-5ls but extended to 27 instructions by using one of the 8 predication codes as an extension bit.
  • OPC-7 - 32 bits, single word instruction, two instruction formats, now 5 bits of opcode, 31 instructions, still a predicated two address machine with 16 registers. Constants are 16 bits or 20 bits, sign-extended.
It seems that we like predication and we like a 16 register machine. Only the OPC-6 has a push and pop. Only the OPC-7 has a byte permutation instruction. Only the OPC-7 has fixed length instructions.


Fri Mar 01, 2019 10:36 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
So, here are some of our thoughts from our emails:

Quote:
A 24-bit derivative of OPC-7... yes, we could probably drop 8 bits, with a smaller regfile and dropping predication. Keeping two instruction formats, maybe something like
1 length + 3 src + 5 opcode + 3 dst + 12b operand
1 length + 3 reg + 4 opcode + 16b operand

Feels a bit messy to separate the two register fields. But maybe it's OK. Maybe the length bit is implicit in the opcode. Maybe we don't need quite so much space, so
5 opcode + 3 src + 3 dst + 1 spare + 12b operand
5 opcode + 3 src + 16b operand

Or maybe that spare bit could be useful for a predication or an indirection bit?

Actually if we allow indirection we can maybe do without the long form entirely? All long operands have to be fetched from memory, from the first 8k.


Quote:
I think, for OPC-ness, I might be tempted to go for the two instruction word option like OPC6 which keeps all the decoding v. brief - ie all fields in a constant place/function. That also opens out a 24b address space easily with a full 24b immediate and maybe gives the option of more rather than fewer registers ! A simple hack to OPC7 on those lines would give 32 registers, 24b address space and naturally these addressing modes

    D <- Pred.Op(D, S+Imm6)
    D <- Pred.Op(D, S+Imm24)
    D <- Pred.Op(D, mem[S+Imm6])
    D <- Pred.Op(D, mem[S+Imm24])
(OPC7 didn't have the pre/post inc/dec which was in OPC6)

24b is a really odd size. Maybe 3 is the right number for the machine though - so 3 bits for registers would be 8 registers as you say and trying to keep to a single instruction word, but poss also 3 operands? we haven't tried any 3 operand machines.


Quote:
Interesting tradeoff - whether 3 addresses but only 8 registers, or 2 addresses and as many as 32 registers. I think you said BCPL uses very few registers?

Perhaps 24 isn't small enough to force the two-word instruction, but not quite big enough for a one-word instruction.


Quote:
Quote:
3 addresses but only 8 registers
In our OPC though 8 is really 6, as top and bottom register are PC and all-zeros, so 6 seems suddenly not many.

2 or 3 operands is interesting. I don't think SIAL uses 3 operands much if at all; SIAL does have 3 accumulators, but it feels as though accumulator 'C' was added much later on and doesn't take part in much of anything. So we're not going to see any win for a 3 operand machine there by just doing the simplest 1:1 mapping. We would need at least a keyhole optimisation to try and make the most of it.

In fact the SIAL isn't going to make use of three operands or an expanded register file. You do really need to write for the machine in hand to make use of these things, but even so BCPL does feel like a good way of producing standard tests quickly to flush out bugs.

...

I think that some of the choices come down to whether we really stick with the one page of readable code limit or not. Predication together with R0 and PC in R15 was all about making things regular and so needing very little code. Having a dedicated set of jumps/branches may be better in many ways, but I think would probably take more lines of code. Some of these other options fall in the same category.


Fri Mar 01, 2019 10:37 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Also coming up in discussion:
- what's the killer app, if any, for a 24 bit machine? Ferranti found it very useful for realtime control. It might be a good size for Mandelbrot fractals.
- what fixed or floating point layouts would make sense? 8.16 fixed point, perhaps, or 6 bit exponent and 18 bit mantissa? Or perhaps 1.23 (signed) fixed point, like a very accurate binary slide rule?


Fri Mar 01, 2019 2:43 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Back in 2001 I designed a 24 bitter to fit into an XC4010. About 800LUTs. All sorts of apps could work well with a 24-bit machine. I think it’s more the amount of memory available and address space that limits the apps. 16MW memory available is a lot. A compiler / HLL software need as many registers as can be spared. I’d go with 32 and three operand. With a lot of memory available perhaps an mmu could be implemented as well. (It may be possible to run something like LINUX).
I’ve been working on a version of PACMAN as an app for my own system. Perhaps a simple game would be an app (invaders / life / asteroids).
About floating point, I’m assuming software would be used as fitting hardware FP into one page would be real tough. I have some hardware floating point modules already “done” that might be okay for a starting point if hardware is desired.

_________________
Robert Finch http://www.finitron.ca


Fri Mar 01, 2019 7:37 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Interesting to hear that you've previously visited 24 bit computing Rob! It might make an interesting history/anecdote thread to hear about that machine.

For the floating or fixed point format question, we were only thinking of a software library: revaldinho is in the habit of writing arithmetic and square root routines, but of course you do need to decide on a format.

I wish I could rediscover the early machine which I'd read about, with a fixed point arithmetic where every bit is below the point. It just might have been the Pilot ACE from the 50s:
Quote:
The Pilot ACE held signed 32-bit numbers, and a number could be regarded either as a fraction in the range -1 ≤ x < 1 or as an integer in the range -2^31 ≤ x < 2^31. Both conventions were used at NPL, and there was no very consistent application of either.

(Campbell-Kelly, M. (1981). Programming the Pilot ACE: Early Programming Activity at the National Physics Laboratory. IEEE Annals of the History of Computing, 3(2), 133–162. doi:10.1109/mahc.1981.10015)

As the ACE, so the DEUCE, of course. And the Ferranti Mark 1* was also fixed-point ±1 at the hardware level, this time 40 bits (the machine word being 20 bits):
Quote:
Arithmetic assumed that all numbers were between 1 and minus 1 (with the sign bit at the most significant end, and the implied decimal point after the 2nd m.s. bit).


Sun Mar 03, 2019 10:04 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Update from revaldinho: first preliminary spec and code pushed for what's now called OPC-8, see
https://revaldinho.github.io/opc/opc8spec.html

The main update has been an increase from 7 to 8 bits for the short immediate that's packed into the instruction word: the more often we can use the short immediate, the less often we'll need the two-word instruction format. To find that bit, we had to remove the 'length' bit, so our 32 instructions are now broken down into 8 long-form and up to 24 short-form. An interesting question is exactly which 8 instructions benefit most from the long form. We've chosen AND and XOR for now, among others, so we have no long-form OR. Of course a program can always load a 24 bit value into a register and use that - we have lots of registers - so this is all about efficiency and density.

Some interesting possible tradeoffs we've been thinking about:
- JSR is not strictly needed, so it's gone for now, for a little simplicity at the expense of run time.
- The interrupt mechanism could either use a shadow PC value, or could stash the PC into a particular register, R14 being obvious, and that would remove some hardware.
- JSR and interrupts might possibly have some commonality which might simplify things, in which case maybe JSR will come back.


Sun Mar 03, 2019 12:54 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Now we have an 8 bit embedded literal, it kind of gives us a zero page - 128 words at the bottom of RAM which might be useful for the OS (or application) to use for frequently accessed state or even for frequently used subroutines. Similarly, perhaps for the 128 words at the top of memory - maybe that's even a better place for an ABI, like Acorn's?

In an unexpected turn of events, JSR is back in! So XOR has to lose its long-form, and AND is the only logical operation which allows a full 24 bit operand. (Of course, other operations can make use of a temporary register instead.)

Here's the to-and-fro:

Quote:
Just looking through the list again I see one non-standard instruction which is the JSR. Originally we did that in two instructions

Code:
MOV Rlink, PC + 2
MOV PC, Rdest, immediate

.. so this isn't completely necessary and does add a little complexity but it was worthwhile for the 16b machine - saved asm code space and of course if calls are frequent in inner loops then it's a performance improvement too. It's definitely an optional instruction though if wanting to get back to minimalism.

If JSR went it would free up another long immediate instruction, so we could get OR or XOR in there too.


Quote:
interesting... I think then JSR should be out, and long-mode XOR
should be in. Then we can flip a bit. Or a bunch of bits.


Quote:
Ok, let’s do that. JSR out and long XOR in.


Quote:
On JSR I think we may need to pop it back in. The one thing the macro version doesn’t give us is predicated execution so there aren’t easy conditional calls. There are some of those in the pi spigot hand written code so they must be a Good Thing. That means moving XOR back to the short unmediated only unless you prefer to move AND instead?


Quote:
So JSR is back in (and that has made porting the OPC6 assembler a little less painful than it would otherwise have been) and AND is the only long logical operation. I agree that seems right and was also my first choice with only one slot open. For simplifying the assembler you need to prefix the long instructions with an 'l'. I did think of making the assembler do the work for you, but in our one-page version (actually spilling over a lot at the moment) it's a real pfaff to work out how large the immediates are on pass-0 if they refer to labels. I'm sure this would be fixable with a better and larger assembler, but for now you need to use 'lmov' and 'mov' and the assembler will show an error if your constant is out of range for the short form.


Wed Mar 06, 2019 12:18 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
How about adding an interrupt polling instruction to OPC8? It could probably be done by making the software interrupts conditional upon there being a hardware interrupt present. Having an extra bit to gate the SWI’s bits with the external hardware input in the PUTPSR instruction might do it.

_________________
Robert Finch http://www.finitron.ca


Thu Mar 07, 2019 7:41 pm
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
I was going to comment yesterday, but time pressures got in the way.

First, why is it 128 words since your immediate constant is 8 bits wide instead of 7 bits wide?

Given, the number of registers, I don't quite see the need for any of your long arithmetic and logical instructions. I would think that the RISC-like nature of your machine would be better served by instructions like the WAI instruction that Rob suggested. The long instructions may be useful, but I'm not sure that their usage frequency justifies their inclusion.

Some instructions like the 6502 TSB/TRB instructions may be advantageous for system functions: they support semaphores, mutexes, etc. A modification that may fit with your predicated instruction architecture is a test and increment/decrement, where the increment/decrement operation is conditional. These would be atomic and essentially implement counting semaphores.

Since you have included an RTI instruction, I am tempted to suggest RTS.

Another instruction that I found very useful for my M65C02A, to provide better support for stack frames, was my ADJ #imm instruction. The x86 allows the allocated stack frame to be discarded using a parameter included with the RTN instruction. As implemented, that feature is specifically oriented toward the implementation of stack frames in the Pascal model where its the callee's responsibility to remove the local variables and the function parameters from the stack. For the M65C02A, I perform that operation immediately after the RTS instruction in the caller. Thus, the compiler simply puts in the code to remove the local variables of the subroutine, using a frame pointer, and that satisfies the Pascal model, and makes the instruction useful in a C environment. For the 6502/65C02, manipulation of the stack pointer is difficult and requires quite a few instructions. With the ADJ #imm instruction, the operation is simple and uses only 2 cycles under the more common conditions.

Have enjoyed reading this thread. Looking forward to more.

_________________
Michael A.


Thu Mar 07, 2019 11:49 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Thanks for the comments. I think one thing to bear in mind is the one-page nature of the OPC8 machine. Not only do we only have 24 bits of instruction to parcel out, but we only have 66 lines of python and verilog to describe the machine (for emulation and for synthesis, respectively) - so that rather forces a fairly regular machine, since deviations from regularity take code to describe. Actually, we also have a one page assembler, and that can be a nudge too, as seen upthread in the case of JSR.

(I suppose I have titled this thread as 24-bit word-oriented computing, so perhaps it's a bit wrong of me to concentrate pretty much entirely on the one-page-computing implementation of that.)

Another thing to bear in mind is that all our OPC machines - almost all machines in fact - are Turing complete, in the sense that they can do anything, given enough memory and time. So, pretty much every addition or change is driven by some kind of consideration of efficiency, or ease of programming, or implementation. It's always a judgement call rather than an absolute necessity.

robfinch wrote:
How about adding an interrupt polling instruction to OPC8? It could probably be done by making the software interrupts conditional upon there being a hardware interrupt present. Having an extra bit to gate the SWI’s bits with the external hardware input in the PUTPSR instruction might do it.


Revaldinho notes that we can sit in a tight jump-to-self loop, if we're waiting for an interrupt. But this might not be quite what you have in mind. He also notes that overall interrupt response is likely to be dominated by register saves and reloads - we could possibly fit in an alternate register set, like Z80, without much code.

MichaelM wrote:
First, why is it 128 words since your immediate constant is 8 bits wide instead of 7 bits wide?

That's an easy one: our immediate constant is signed. That's an example of regularity: sometimes it makes great sense to be signed, so making it always signed is a win for the one-page aspect. It's interesting that the small negative addresses therefore make an area at the top of memory easy to access. That could be for an ABI, or for memory-mapped I/O, or both.

Quote:
Given, the number of registers, I don't quite see the need for any of your long arithmetic and logical instructions. I would think that the RISC-like nature of your machine would be better served by instructions like the WAI instruction that Rob suggested. The long instructions may be useful, but I'm not sure that their usage frequency justifies their inclusion.

So, the long/short question was an interesting one. If we have all instructions available in both long and short, that means we get 16 instructions times two modes. But 16 isn't very many. If, instead of half the 5-bit space for each, we divide into one quarter and three quarters, we get 8 long instructions and 24 short. But, for regularity, we more or less implicitly decided that the 8 long instructions also need short counterparts. So now we have 16 instructions which are short-only, and 8 instructions which can be in either mode.

At that point it comes down to filling the 8 slots: load and store are obvious. Register move is useful, not only for constants but also long jumps. JSR (given that we do have a JSR) is useful. As you note, that leaves 4 slots, and we chose the 3 arithmetic instructions and then one logical. One might wonder whether a long-form OR or a long-form CMP is a better choice, for example - I don't know, and I think we'd need to write some code to find out.

Quote:
Some instructions like the 6502 TSB/TRB instructions may be advantageous for system functions: they support semaphores, mutexes, etc. A modification that may fit with your predicated instruction architecture is a test and increment/decrement, where the increment/decrement operation is conditional. These would be atomic and essentially implement counting semaphores.

I suspect the addition of RMW or act-and-branch would complexify the state machine. I'm sure you're right, in the absence of a one-page constraint, these instructions would compete against the others for inclusion. I think it's always necessary to write a variety of typical application code, to see which choices seem best. It's always a judgement call. And in the case that there might be a HLL compiler, that's another consideration, as to which instructions can be used by a compiler.

We'll certainly think a bit more about this point. You might well be right that predication is readily extended to the idea of a conditional update of a register.

Quote:
Since you have included an RTI instruction, I am tempted to suggest RTS.

I think we get RTS for free:
Code:
mov pc, rlink,0

It's a little like JSR but even easier. A macro assembler can have a macro to provide RTS as such. But RTI is special because it has to access the processor status.

Quote:
Another instruction that I found very useful for my M65C02A, to provide better support for stack frames, was my ADJ #imm instruction...


We think we might have this already - revaldinho says
If R14 is the stack pointer then mov r14, r14, #imm does the manipulation.

Quote:
Have enjoyed reading this thread. Looking forward to more.

Great to hear it!


Fri Mar 08, 2019 1:10 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Revaldinho notes that we can sit in a tight jump-to-self loop, if we're waiting for an interrupt.

I was thinking of a poll operation that didn't block, so no loop. But I've since realized this might be accomplished by simply enabling interrupts for a clock cycle, then disabling them again in the next cycle. If an enable is followed immediately by a disable of interrupts will the processor still pick up the interrupt? Or does enable have to be active for some number of cycles?

Would you consider making this a two-page challenge instead of one page? One page of code is great but there are all kinds of additional operations that could be done with a slightly larger processor and the OPC processor seems to be growing. Struggling to cram things all onto one page seems to me to be a dubious effort. Adding a simple mmu might be handy.

I agree with Michael, the long instructions probably just aren’t worth the trouble and not having them would open up the availability of more instructions. A 24x16 multiply might be handy. To form long constants, a constant postfix or prefix instruction could be used. Opcodes reserved for floating-point could be useful.

Attached is a copy of the second page of data sheet containing the guts of the ISA for my own 24-bit processor effort from 18 years ago. I never did finish the project (it migrated to a 32-bit cpu when a larger FPGA was available). One thing supported were bytes, double-bytes and word accesses to memory.
Attachment:
File comment: 24-bit Sparrow CPU datasheet
BC24-2.pdf [39.57 KiB]
Downloaded 433 times

_________________
Robert Finch http://www.finitron.ca


Sat Mar 09, 2019 8:17 am
Profile WWW

Joined: Tue Apr 25, 2017 7:33 pm
Posts: 32
robfinch wrote:
I was thinking of a poll operation that didn't block, so no loop. But I've since realized this might be accomplished by simply enabling interrupts for a clock cycle, then disabling them again in the next cycle. If an enable is followed immediately by a disable of interrupts will the processor still pick up the interrupt? Or does enable have to be active for some number of cycles?


Yes, I think this would work. I would need to check when I have a verilog sim running.

robfinch wrote:
I agree with Michael, the long instructions probably just aren’t worth the trouble and not having them would open up the availability of more instructions. A 24x16 multiply might be handy. To form long constants, a constant postfix or prefix instruction could be used. Opcodes reserved for floating-point could be useful.


I think we may be talking about the same thing here. OPC8 really only has one size of instruction - all operations are 24bits. If an instruction opcode starts with the top two bits set, then an additional constant is read in instead of the immediate. So instructions beginning '10' and '11' behave identically except that the latter use the next word as a long immediate. Is that the same thing as constant postfix or prefix instructions effectively ?

Stats are good, so here are some taken from the first few tests which the emulator can run

- fib.s
- pi-spigot-rev.s
- bigsieve.s

These are the dynamic stats from all trace files concatenated together:

Code:
Dynamic Instruction Usage from tmp.trace

All Instructions

           add     209654 : *****************************************************************
           mov     100416 : *******************************
   mov[dst=pc]      98889 : *******************************
           sub      92849 : *****************************
           cmp      82643 : **************************
           rol      82176 : **************************
           lsr      12721 : ****
          ljsr       7065 : ***
           sto       3688 : **
            ld       3648 : **
  lmov[dst=pc]       3565 : **
           lld        368 : *
          lsto        345 : *
           jsr        317 : *
           and        200 : *
           not        120 : *
          lmov        104 : *
           ror        100 : *
            or        100 : *
          lcmp         63 : *
    ld[dst=pc]         29 : *
   sub[dst=pc]          5 : *
          halt          2 : *

Instructions using predication

   mov[dst=pc]      91689 : ****************************
           sub      82176 : **************************
           mov      82176 : **************************
           add      15645 : *****
  lmov[dst=pc]       3563 : **
           jsr        199 : *
           and        100 : *
            or        100 : *
          lsto        100 : *
           sto         32 : *
   sub[dst=pc]          5 : *

Predicate usage

             c     180028 : *****************************************************************
            nz      91233 : *********************************
             z       3487 : **
            nc        700 : *
            mi        169 : *
            pl        168 : *


Instruction Summary by Type

All instructions          :     699067
- Single word             :     687557 (98.4%)
- Two word                :      11510 (1.6%)
Predicated instructions   :     275785 (39.5%)
Jumps                     :      98889



..and these are the static stats from the program listings.

Code:
Static Instruction Usage from tmp.lst

All Instructions

           mov         66 : *****************************************************************
           add         24 : ************************
   mov[dst=pc]         20 : ********************
           sto         19 : *******************
           jsr         16 : ****************
            ld         14 : **************
          lmov         11 : ***********
           sub         10 : **********
           lsr          8 : ********
          ljsr          6 : ******
           cmp          6 : ******
  lmov[dst=pc]          5 : *****
          lsto          5 : *****
          halt          3 : ***
           lld          3 : ***
          lcmp          2 : **
    ld[dst=pc]          2 : **
           and          2 : **
           rol          1 : *
   sub[dst=pc]          1 : *
           not          1 : *
           ror          1 : *
            or          1 : *

Instructions using predication

   mov[dst=pc]         11 : ***********
           add          4 : ****
  lmov[dst=pc]          3 : ***
           sub          2 : **
           jsr          2 : **
           sto          1 : *
           mov          1 : *
   sub[dst=pc]          1 : *
           and          1 : *
            or          1 : *
          lsto          1 : *

Predicate usage

            nz         10 : *****************************************************************
             c          7 : *********************************************
             z          4 : **************************
            nc          4 : **************************
            mi          2 : *************
            pl          1 : *******


Instruction Summary by Type

All instructions          :        227
- Single word             :        195 (85.9%)
- Two word                :         32 (14.1%)
Predicated instructions   :         28 (12.3%)
Jumps                     :         20


On this small set of code it looks like instructions with the following long constant are quite handy when writing the code (14% of all instructions), but given the large number of registers it's pretty easy to keep them out of the tight inner loops so that they barely register in execution (1.5% of instructions).

And on the short immediates, it might make more sense if we made the short immediate unsigned only when the source register is R0. That would let you mask bytes and mean that the 'page 0' area is contiguous from 0-0xFF.

Code:
AND r1,r0, #0xFF     ;   r1 <-   r1 AND (0xFF)
JSR  pc, r0, #0xF0   ;   jump to 0x0000F0


I've downloaded your PDF - that looks like a much more comprehensive machine. And three operands too. Did you look at the FM1600 instruction set at all and compare with your RISC ? OPC8 is nothing like the FM1600 but that was the inspiration to look at 24bit computing. I have to admit I've just taken a short cut here and modified the OPC model to fit, but really I would quite like to look at emulating the actual FM1600 machine sooner or later.


Sat Mar 09, 2019 2:22 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Thanks for the stats, always good to have those. Those long-form instructions are a bit of a help for density, of course, even if not for performance.

A couple of thoughts on instruction encoding:
- as with OPC6, we could possibly steal of the 8 predication codes as an opcode bit, to add many instructions which don't need predication
- we could change the long-form prefix from 11 to 111. Instead of 16 short-only and 8(+8) dual-mode instructions, we'd have 24 short-only and 4(+4) dual-mode instructions. Enough for LMOV, LJSR, LLD and LSTO. We'd lose the long-form arithmetic and logical, but gain 8 encodings for new short-only instructions.

I think I like the unsigned and therefore double-size zero page idea, but I might need to think about it!

Rob: you're right of course, with a two-page machine you could do much more, probably more than twice as much, or relax the dense code style a bit.

Now, to do away with the long-format and therefore the full-size constant, you'd need to form full-size constants in 8-bit pieces, or load them from code memory. It might be OK: if a routine is preceded (or followed) by a small constant pool, one or a few registers could be loaded in the preamble using short relative offsets. I think I prefer that, and keep the instruction stream itself extremely simple and single-format. Although the transputer-style building up of longer instructions has a certain charm.

Edit: Oh, and thanks Rob for the PDF datasheet of your 24 bit machine! I'll need to study that.


Sat Mar 09, 2019 3:30 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
A thought about neatly sidestepping the one page limitation (which, if we remember, was about setting your own rules): we did in one or two situations make use of a small cache, written by Dave [hoglet], and that was a separate module and didn't count against the one page limitation. One good motivation for that is that it's optional, and it's arguably not part of the CPU but part of the memory system.

By the same token, an MMU could be a separate module, as could an FPU, or a multiplier, or a barrel shifter, or any other kind of memory-mapped coprocessor or transport-trigged device. Personally I'd make such devices memory-mapped in some way, or I/O peripherals, rather than adding coprocessor instructions to the main CPU, or doing the x87 thing and having the device snoop on the instruction stream. Some of the early machines (ACE, DEUCE, I think) had autonomous multiplication and division hardware, so the main CPU could continue in parallel, computing sign-compensation or rounding while the unsigned result was being calculated. Even worse, there was a trick to "convert binary to BCD by changing the content of the divisor and dividend during division" and similar tricks which fiddled with the multiplier while it was in mid-computation. So, autonomous hardware allows parallel programming, which is interesting. Those early machines also used dead-reckoning to pick up the completed result, so no need to poll a status bit, and no status bit to poll.


Sat Mar 09, 2019 3:44 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(There might be some updates coming to OPC8, but meantime, I'd completely failed to notice until revaldinho mentioned it, that the current encoding already has 4 free slots for short-form instructions. So are by no means full, even before considering the ideas I mentioned earlier today.)

Edit: linkify


Sat Mar 09, 2019 6:59 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 20 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: SemrushBot, trendictionbot and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software