View unanswered posts | View active topics It is currently Fri Mar 29, 2024 11:21 am



Reply to topic  [ 18 posts ]  Go to page 1, 2  Next
 DIP-8: 8-bit TTL computer 
Author Message
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
This is my 8-bit TTL computer/CPU design, DIP-8. It's my little platform for exploring the bottom of the computing "stack" - hardware, assemblers, OS design and so on. I've been making little sketches of CPUs and messing around in Logisim for a while, so my main goal for this project was to actually get something physical that works. That's why I've kept the hardware relatively simple and used EEPROMs for the ALU and instruction decoding. This does not make it the "purest" of TTL CPU designs, but it's nice that I can continue to experiment with the ISA by reprogramming these. It's also not the fastest, although performance is better than expected and it runs happily at 4 MHz.

I built an earlier version on Eurocards - here it is generating a mandelbrot: https://youtube.com/shorts/CFzhd7aeNsY

The new version is a single board and features 64K of RAM, 64K of ROM, two serial ports, and three expansion slots (which share another 64K of IO space). I put a little prototyping area on there, and immediately had to make use of it, as I messed up the memory banking logic.

My experiments now are on the software side - can I make this thing nicer to program, with a higher-level language, or maybe a virtual machine? How far can I go making a little multitasking OS? The dream would be to have it serve its own web page like some of the other machines on the homebrew CPU web ring.


Attachments:
20230102_0151.jpg
20230102_0151.jpg [ 689.39 KiB | Viewed 4796 times ]

_________________
-Kyle
dip16.com
Wed Jan 04, 2023 11:22 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Very neatly laid out. Do you have a block diagram or schematic?

_________________
Robert Finch http://www.finitron.ca


Thu Jan 05, 2023 4:02 am
Profile WWW
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
Attachment:
dip8_diagram.png
dip8_diagram.png [ 29.61 KiB | Viewed 4766 times ]


Some things of note:

Instructions consist of 1 opcode byte, plus any number of immediate bytes. The 8-bit opcode, plus a 4 bit sequence counter and 3 flags (carry, zero, negative) go into the control store. So instructions can be up to 16 cycles long - in the current ISA, the longest is 11 and most take 2 or 3.

When you have many registers, addressing modes, and ALU operations to combine together, 256 opcodes isn't that many. So some decisions were made in the ISA:

  • There is an "adr" instruction with an opcode for each addressing mode. This generates an address and puts it in the address register for following instructions to use. My assembler hides this, so "ldx sp+3" becomes two instructions:
    Code:
    06 03   adr sp+3      ; A = stack ptr + 3
    1f      ldx           ; x = mem[A]

  • The temporary register "T" is exposed in the ISA, although again the assembler hides it from you. ALU opcodes all look like "add x, t", "add bh, t" etc. So when you write "add x, y" it becomes:
    Code:
    56    mov t, y    ; t = y
    60    add x, t    ; x = x + t
    There's an extra version of each ALU opcode of the form "add x, #L" which takes an immediate value.
    The 2nd operand can also be a memory location. "add x, [sp+3]" becomes:
    Code:
    06 03    adr sp+3
    1e       ldt
    60       add x, t

  • In the hardware, there are nine identical registers in the register file, but in the ISA they are treated differently. X, Y, BH, BL, CH and CL are all general-purpose 8-bit registers that can be used more or less interchangeably. BH/BL and CH/CL, as their names imply, form two 16-bit registers B and C, and there are extra instructions for working on these - loads, stores and 16-bit ALU operations. B and C can also be used in some addressing modes - "b", "b+x", "b+immediate" etc. SPH and SPL form the stack pointer, which can be pushed to/popped from, added to/subtracted from, and moved in and out of B/C.
    Code:
    00 f0 3f 21    ldb $3ff0            ; load b from abs. address
    e2 34 12       addw b, #$1234       ; add(word) $1234 to b
    3b             push b
    The address register is actually a counter, so it can be incremented to deal with 16-bit loads and stores.

  • Why are there 9 registers? There were 8 originally and then I added the M register. This is yet another register that the assembler can hide from you, and it allows the first operand of ALU instructions to be a memory location. The location gets loaded into M and then written back to with the result. This combined with the 16-bit ALU ops and addressing modes leads to some quite powerful instructions, and makes writing assembly quite fun:
    Code:
    0a ea d2 04     addw [b+x], #1234:       ; add 1234 to the 16-bit value at b+x

Here's an example of the M register in action, from my mandelbrot program:
Code:
top             call mandelbrot
                ldx ival            ; print char[ival]
                mov b, #chars
                ldy b+x
                sty UART_DR

                addw [x0], #XSTEP   ; x0 += XSTEP
                add [px], #1        ; px += 1
                cmp [px], #XCOLS
                jcc top

                stl $0a, UART_DR    ; newline
                stl #0, px          ; px = 0
                mov b, #X0_LEFT     ; x0 = X0_LEFT
                stb x0
                addw [y0], #YSTEP   ; y0 += YSTEP
                add [py], #1        ; py += 1
                cmp [py], #YROWS
                jcc top

_________________
-Kyle
dip16.com


Thu Jan 05, 2023 6:54 pm
Profile
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
Here are all the opcodes. Looking at this, there are a few here that I don't think I've ever used (do I need an "sp +x" addressing mode? Have I ever needed to "stbl" or "sbc x,t"?) Maybe some pruning is required. I found that when I added the 16-bit ALU instructions, they're so useful that they end up being used constantly, at the expense of the 8-bit ones. So there are 28 opcodes for adc/sbc, which are pretty much never used. If I got rid of them maybe I could turn XY into a third 16-bit reg pair.

Attachment:
2023-01-05-191948_1686x402_scrot.png
2023-01-05-191948_1686x402_scrot.png [ 136.89 KiB | Viewed 4763 times ]

_________________
-Kyle
dip16.com


Thu Jan 05, 2023 8:20 pm
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
Very cool project - it's nice to see someone else using the "temp register" pattern!

dip16 wrote:
(do I need an "sp +x" addressing mode?)


Perhaps not when writing in assembly, but I think you'll be very grateful for that addressing mode if you ever attempt to write a C compiler backend for your ISA.


Thu Jan 05, 2023 10:24 pm
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 585
A few 16 register operations would be handy. Many compilers like small c tend generate code like load HL, load DE, ADD HL,DE.
or load HL,(HL) or store HL,(DE).


Fri Jan 06, 2023 7:57 am
Profile
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 68
Location: Michigan USA
Hi Kyle, That is a really nice looking computer! I especially admire the compact hardware layout.

I'm having a little difficulty reading the part numbers on the ICs... a schematic would be helpful. I have quite a few questions, but will start with one that has been a challenge on LALU. How are you creating the files for operation of the ALU and Controller? (On LALU I create the files using an EXCEL spreadsheet and then manually cut-and-paste them using a HEX editor.)


Sat Jan 07, 2023 12:34 pm
Profile WWW
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
Thanks mmruzek! I've attached the schematic - any questions, just fire away.

To generate the ROMs I created some Python scripts. The ALU one is here: https://github.com/kylesrm/dip8-compute ... /genalu.py

There is a function for each ALU operation, then a big loop that loops through all possible inputs to the chip (all possible addresses). Both of my ALU ROMs use the same image, but A15 is low on one and high on the other, so they know which one they are.

My script for the control ROM is a bit more complicated. It's here: https://github.com/kylesrm/dip8-compute ... decoder.py
It uses another file which contains the instruction definitions: https://github.com/kylesrm/dip8-compute ... uctions.py

I used some features of Python to make that instruction file as easy to read and write as possible. Each instruction is a multi-line string, where each line corresponds to one cycle. On each line are the names of which signals are asserted during that cycle. A nice feature of Python is that you can include a string within another string by putting its name in {brackets}. So at the top I can define some common patterns such as:

Code:
LoadLiteral     = 'pcinc memrd'                 # literal -> data bus

And then for example "jmp" looks like:
Code:
inst[0x30] = f'''jmp ADDR
    {LoadLiteral} alwr
    {LoadLiteral} ahwr
    aout pcwr'''


That loads 8 bits into AL, 8 bits into AH, and then writes A into PC. The first line, "jmp ADDR" is just a comment, but there is a fetch cycle added to the front of each instruction. So I can look at the definition for jmp and see that there are four lines and know that it takes four cycles.

I had to add a way to define conditional instructions as well, which was a bit hacky, but it works. The nice thing about the script is that it does some error checking, so if I've made a typo and referenced a signal that doesn't exist, or if an instruction takes too many cycles, I will get an error.


Attachments:
File comment: Rev 1 schematic
dip8rev1.pdf [455.34 KiB]
Downloaded 185 times

_________________
-Kyle
dip16.com
Sat Jan 07, 2023 3:51 pm
Profile
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
robinsonb5 wrote:
Perhaps not when writing in assembly, but I think you'll be very grateful for that addressing mode if you ever attempt to write a C compiler backend for your ISA.


Yes, I guess this would be useful for local arrays? I actually do have the beginnings of a VBCC backend, but it can only compile some simple programs at the moment.

Just been reading your EightThirtyTwo thread and agree with your comment about it being difficult to allow the compiler to make use of the specific features of your CPU.

I should take a look at the backend again actually. I'm not that interested in getting big C programs running, but it's useful to see what things a high-level language needs from an architecture.

_________________
-Kyle
dip16.com


Sun Jan 08, 2023 11:06 pm
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
dip16 wrote:
Yes, I guess this would be useful for local arrays?


Certainly, but what I actually had in mind was just general access to variables on the stack, since the compiled code will be doing that a lot on an ISA with only a handful of registers.

Quote:
I actually do have the beginnings of a VBCC backend, but it can only compile some simple programs at the moment.


Oh very nice! I do like vbcc - especially the fact that it's lightweight enough to self-compile even on its more minimal targets. I'd like EightThirtyTwo to be self-hosting one day - but for now I'm satisfied that the compiler, assembler and linker can be built and run under AmigaOS!

Quote:
Just been reading your EightThirtyTwo thread and agree with your comment about it being difficult to allow the compiler to make use of the specific features of your CPU.


Indeed - the addressing mode system provides a useful hook for this, but you have to do the heavy lifting yourself of identifying candidate sequences for replacement.

Quote:
I should take a look at the backend again actually. I'm not that interested in getting big C programs running, but it's useful to see what things a high-level language needs from an architecture.


Absolutely - but a compiler is also the quickest path to having a lot more code available for testing the CPU. I think I found more CPU bugs while getting Dhrystone up and running than at any other point in the process. (The compiler milestones for me were printf, dhrystone, FAT32 filesystem and - most recently - libjpeg.)


Mon Jan 09, 2023 10:54 pm
Profile
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
I went over the basics of the vbcc backend again, to make sure it's doing the right thing.

An intermediate code representing an add is of the form Z = Q1 + Q2, where any of the operands can be in registers or in memory. I need to turn this into a 2-operand form that makes use of DIP-8's instructions. The most general way to do that seems to be:
Code:
    mov Rtemp, Q1
    add Rtemp, Q2
    mov Z, Rtemp

The movs could represent loads or stores if Q1 or Z are in memory. Q2 can be anything - a register, a memory location or a literal, as the instruction set supports this easily.

Using a temporary register is really a last resort though, as I'm very limited on registers - for 16-bit operations I only have two, b and c! So I need to look at Z and Q1 and see if they can be used instead.
If Z == Q1 then the instruction is already in 2-operand form, which is ideal:
Code:
    add Z, Q2

If Z is a register, then it can be used as Rtemp, removing the final mov:
Code:
    mov Z, Q1
    add Z, Q2

There is also the possibility of using Q1:
Code:
    add Q1, Q2
    mov Z, Q1

I can only do this if the value of Q1 isn't needed beyond this instruction. I can check that by looking ahead to the next IC and seeing if it's "freereg Q1". I found that this was worth adding, and it gets invoked for longer expressions like "x + y + z", where the compiler allocates its own scratch register to store intermediate values. It makes sense for the backend to also use that register, if possible!

If none of these apply then a new temporary register is needed. If none are free then one needs to be spilled to the stack, but there's another possible optimisation here: if Q1 is a register, it's the one that should be spilled, because then it can be used as Rtemp, knowing that its previous value is going to be restored.

With all these rules working, the generated code actually seems pretty good, given I only have two registers. One more would be nice though, so I'm going to look into that.

_________________
-Kyle
dip16.com


Mon Jan 16, 2023 2:37 pm
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 585
While slow, one could do a extracode call. This is call with the register pattern after the call as a literal.
call op, #, order code reg pattern.
You would add pop and push that take the register pattern and load just the ac/ix constant into the ir, thus giving you
indirect pop/push,
pushi:= ir.ac[]=read(sp-1),push reg ac[],ir=*pc++;
push2i:= ir.ix[]=read(sp-3),push reg ix[],ir=*pc++;
popi = ir.ac[]=read(sp-3),pop reg ac[],ir=*pc++;
reti = sp=sp+1, jump (sp++);

While this may not solve this problem as a extened op, you may define new ops like multiply divide, shift r n.
-----
routine
pushi -- save ac'
push2i -- save ix'
save regs
do fancy stuff with ac' ix'
save new ac' at old top of stack
restore regs
add sp #2
popi -- set new ac' value
reti


Mon Jan 16, 2023 4:52 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
dip16 wrote:
I went over the basics of the vbcc backend again, to make sure it's doing the right thing.
...
With all these rules working, the generated code actually seems pretty good, given I only have two registers. One more would be nice though, so I'm going to look into that.

Interesting - and successful! Well done.


Tue Jan 17, 2023 8:18 am
Profile
User avatar

Joined: Sat Nov 19, 2022 7:42 pm
Posts: 10
Location: Europe/London
@oldben: interesting about "extracode". That sent me down a rabbit hole reading about the AGC and Atlas.

Took a look at sign extension, as it keeps popping up and I don't have a nice way to do it at the moment. My first attempt was:
Code:
sign extend bl -> b:
    mov t, #0
    mov bh, t
    cmp bl, t
    jns skip      ; skip if sign bit not set
    sub bh, #1
skip


That takes 10-12 cycles, which isn't so bad, but 9 bytes of code. I then came up with a slightly better version:
Code:
    mov bh, $ff
    sig
    cmp bl, #0   ; carry set if positive
    adc bh, #0


That one takes 10 cycles and 7 bytes. The "sig" instruction makes the next comparison signed, and the way it works it a bit of a hack, as it was a late addition! It runs the "sig" ALU operation, which does nothing but set an otherwise impossible combination of flags (Z, N), and then "cmp" is decoded differently based on these as if it were a conditional instruction. (As well as making comparisons signed, I realised that "sig" could more generally act as a "signal" to modify the behaviour of other instructions - but I've not found another use for it yet.)

So maybe sign extension would be a good candidate for another ALU operation. I think it would be a two-parter, where the first part sets N or C if the low byte is negative, and the second part outputs the correct value for the high byte based on that.

_________________
-Kyle
dip16.com


Wed Jan 18, 2023 9:06 pm
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 585
I found in most cases it is best to have bytes unsigned, as you can clear the upper 8 bits often easy.
The only place I use signed bytes, is for constants like # -3 or #'a .


Thu Jan 19, 2023 1:32 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 18 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software