Last visit was: Mon Aug 15, 2022 1:40 am
It is currently Mon Aug 15, 2022 1:40 am

 [ 237 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 16  Next
 rj16 - a homebrew 16-bit cpu 
Author Message

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
oldben wrote:
Lea is a needed instruction, once you have more complex addressing modes.

Noted. I sort of have special forms of that instruction but it would be nice to have an orthogonal one that works with any addressing mode.

oldben wrote:
64Kb is just ample for a tiny OS like cp/m or mini-flex. They have a simple file format that
is compact but at the price of reading small sectors and small floppy discs.

If I could find one that's easy to port (that is, written in C with a minimum of assembly), that would be great. So far the best I have found is pico]OS. But I want an OS with a shell of some sort that can load and run programs from disk, and pico]OS is not really that. But I will keep looking. I also found fuzix but it's rather z80 specific. There's also ELKS linux and minix, but I think those are too much work to port. I don't know if there's a C version of CP/M -- the versions I have found so far are assembly.

oldben wrote:
A MMU is needed with a more moden O/S, replacing use of base registers for time sharing.
OS/9 level II (6809) was good example. 74LS610 / 74LS612 does all the work, if can find one.

OS/9 level II is very interesting, thanks for mentioning that one, I hadn't heard of it. The way it uses the relocatable code features of the 6809 to allow programs to share a single page/bank of memory is very interesting. Larger pages allows a much simpler MMU implementation with fewer registers which is appealing.

The 74LS612/610 was useful/popular enough that I have seen some designs for a ttl version, and maybe if I keep looking I can find a verilog version that could fit in a CPLD or small FPGA. So that's an interesting option.

BigEd wrote:
I think the 6502 perhaps does the least by way of 16 bit affordances: any 16 bit addresses are held in memory, and you can only apply an 8 bit index to them.

robfinch wrote:
The 6502 has an interesting way of allowing access to memory via indirect pointers stored in zero page. The 65816 can use 24-bit pointers stored in zero-page memory even though the registers are only 16-bit. Maybe some sort of memory indirect addressing would work. 6809 also have indirect memory addressing.

I do agree that the 6502 is probably the simplest possible implementation.

rj16 doesn't have a zero page, and memory indirect addressing would be tricky to implement on a RISC machine without first loading the value into a register big enough for it to fit.

But I suppose it could be possible to stall the processor to do a multi-step memory operation, where it loads an address into a temporary register from memory in the first step, then loads the value from memory at that address in the second step. Kinda feels like this would be a simpler solution for a CISC machine with microcoded instructions though. I think it gets a bit complex when trying to do the same thing in a RISC machine.

The SH-2's solution is rather interesting. It's a 32-bit RISC machine with only 16-bit wide instructions. Any addresses or other 32-bit constants are stored as literals in the program, and it provides a rich set of PC-relative addressing instructions to make it easier to work with them. But of course, it has registers wide enough to store a 32-bit value, and a 32-bit ALU to work with them.

BigEd wrote:
About complexity though: you have to build it, and test it, and document it, and convince your compiler or your users to make good use of it. So I would always advise trying to simplify: perhaps keep some ideas in reserve, and explore a minimal viable subset.

Yeah, I have been trying to keep it simple but I have multiple competing goals with this project. A minimum viable subset is a really good idea though, I wonder if I can try to design rj16 in such a way that it could be extended with some of these features, but not implement them now.

I guess what I really need to do is define exactly what my goals are for this project and the value I want to deliver, then come up with priorities so that I can eliminate tangents and know when I am drifting from my goals and going down a rabbit trail. Otherwise, well, I will never complete this project.

Sun Dec 13, 2020 4:01 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
This morning's work:


The memory access instructions need more thought/work. I might be missing instructions. I also need to think more about the shift instructions now that I would like to implement a funnel shifter. And the sizes of the immediates will likely change too.

This processor only has a single flag: t. It will be either the carry/borrow (adc and sbc only) or the result of a compare (eq/lt/ge). Also, t only lasts to the next instruction (except for nop and prefix instructions), and sequences of instructions that use t are not interruptible to avoid needing to save t on interrupt. There's a conditional move (mvt/mvf) that can use the t flag, as well as conditional branches (brt/brf), and the add/sub instructions use it for carry.

rd is either the first register source operand, the destination register, or both. rs is the second operand.

There is a prefix that can add rt and ru registers as well, and one of them can be shifted by 1,2,4, or 8 bits to the left. I am still unsure if I want this prefix instruction since it adds a lot of complexity, but it's useful for specifying the extra registers for the funnel shifter.

Anyway, I built this because I had the idea to design the instruction encodings around the different forms of each instruction, so I wanted to enumerate them all. I am still in the process of designing a new version of the encodings that allows the extra addressing forms. I am not sure if the new version will be better than what I currently have or not, but it's worth an explore before I start updating the v6 simulator to the new version.

You do not have the required permissions to view the files attached to this post.

Tue Dec 15, 2020 2:42 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
EDIT: Click here to jump to the latest version of the ISA, a 32-bit version (though the machine is still planned to be mostly 16-bits)

The ISA encodings have been updated. I did indeed have to shrink the immediates by a bit. I could almost make it fit with 7 bit immediates, but I had to move the register-immediate space from 24 to 32 opcodes, and pretty much all of them are used.


This looks more complicated, but I just duplicated some of the instruction formats to better group them by instruction type. It's only slightly more complicated to pre-decode in that I think it will take 3 layers of mux -- but I will put the circuit in Digital and see if it can come up with something simpler.

I am still not sure about the base + index prefix, but in this version there's lots of room to reserve an opcode for it. I may or may not implement it. I don't think it will be needed for funnel shift ops.

Here's what the opcode space looks like. There is a pre-decoder that pulls the opcode bits from the instruction, compresses the prefix codes, and produces a linear 7-bit opcode. The predecoder logic for that is on the left:


So, the opcodes are more regular than the previous version. The only exception is op0 becomes op1 in the very last row. So all that's needed is to compress the prefix codes into 3 bits. The "Row", "Col", "Cel" are for the op map which is almost done, but not quite... maybe I can post that later today.

You do not have the required permissions to view the files attached to this post.

Last edited by rj45 on Thu Dec 31, 2020 12:47 am, edited 1 time in total.

Wed Dec 16, 2020 2:48 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
And here's a preliminary version of the op map.


This takes the row and col from the "opcode space" diagram and maps the opcodes into a table. The point is to try to map them semantically so hopefully it will reduce the number of gates required to decode them, but I may need to write an optimizer to make it more optimal. The first three bits are the row, the next three bits are the column, and the last bit is the cell -- 0 for the upper cell, 1 for the lower one. The cells are coloured by what instruction format they have, which is indicated on the right side as well.

These are the internal "machine" instructions, so they are suffixed with i, ipc, isp, ibp etc to indicate the format and base register. The instructions are loosely risc v-ish, so hopefully they will be self-explanatory for someone familiar with risc v. If not, well, at some point I will define them all in a document.

If there's a bunch of lwrahsp gibberish instructions, they are probably safe to ignore, I will likely remove them. I am experimenting to see if some ideas fit. They do, but I am not sure the extra complication will be worth it.

The pc-relative load/stores are also experimental. It would be nice to keep literals/constants and code together, but if you take a pointer to them, it's impossible to tell that pointer from a pointer to data memory except in the case of function pointers. I don't think const pointers would only point to literals/constants, I think they can also point to things on the stack or heap. But my solution so far has been to store literals and constants in data memory instead, and have the bootloader copy them from ROM.

Anyway, next step is to update the C compiler and make sure all the instructions it needs are represented. Then I need to update the simulator to use the latest version of the ISA. That will probably force some updates to these diagrams and hopefully some simplifications as well.

If you have thoughts on these diagrams let me know. Especially if any simplifications come to mind, or problems I may have missed, that's very helpful.

You do not have the required permissions to view the files attached to this post.

Thu Dec 17, 2020 12:23 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1702
It does seem a good plan to be co-developing the instruction set, the compiler, the encoding, the simulator.

Thu Dec 17, 2020 9:45 am

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So, I would like to run an OS on this CPU. Preferably a unix-like one (though a DOS-like one would be fine too) with a shell and a way to run programs.

But there's a couple problems with this goal:

1. I don't want to write an OS from scratch, I would like to port an existing one. Finding an OS that supports 16-bit pointers is proving to be a real challenge. There's some, but they aren't very portable. They are usually designed for the z80 or 8086 and haven't been ported to any other cpu before. As soon as you move up to 32-bit pointers, though, tons of options open up for really easy to port OSes that support lots of different architectures, hence my post asking if there's an easy way to add 32-bit pointers.

2. LCC produces assembly, and the assembler I am using doesn't support object files and there is no linker. But the build system of all the OSes I have tried to build so far assumes that you compile C files into object files, then link them together at the end. I really don't enjoy reworking build systems, and I don't really want to build an assembler that supports object files, and then also a linker.

So I was looking at what architectures GCC supports, and I just happened to investigate the Motorola M*Core architecture.

And lo and behold! It's pretty much identical to the CPU I have been designing. It has 16 registers, 16-bit instructions, 2-operands, and the instruction set is very similar. It even has a C flag that's used in the same way as my T flag is, with conditional moves and everything. I feel like every decision I have made so far in designing rj16 are the same decisions Motorola made in designing the M*Core. And it has a full featured GCC toolchain that's supported in the latest versions of GCC.

Having a full featured toolchain would be awesome, and I feel like M*Core is not a huge departure from rj16 at all. I could reuse a lot of the design I have so far, and it shouldn't be hard to modify the simulator. And actually, if I want to stick with the ISA I have designed, I could write a transpiler easy enough, and it would be easy to add a transpile as the last build step.

But here's the thing: it's a 32-bit processor. I have mixed feelings about switching to 32-bits.

On one hand, finding an OS I can port becomes so much easier, given I no longer have to contend with toolchain and address space limitations. And let's be honest here, the simplest way to give rj16 32-bit pointers is to make the registers and ALU 32-bits. Anything else is a lot more complex.

On the other hand, I was rather attached to the idea of building a 16-bit CPU. There's something to be said for having constraints, they definitely spark creative solutions. And a 16-bit CPU is realistic to implement in TTL... 32-bit is more than I would like to tackle. There's so many 32-bit FPGA CPUs out there... I feel like there's far fewer 16-bit ones.

Anyway, what do you all think?

Sun Dec 20, 2020 4:42 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1702
Nice find to see similarities in an existing core, but yes 32 bits seems like a lot, especially for a discrete implementation. 24 bit turns out to be a good size though.

I fully expect you'd need to take an existing minimal simple OS and port it to your machine. But a C compiler is always a tricky one, unless you're already really handy with one of the existing ones and have some idea how to add a code generator.

Which reminds me, of some famous FPGA CPU which was co-designed with a C compiler, with the aim of making just enough CPU to support the compiler. Ah, here it is, Jan Gray's xr16:

While I'm here and I have a tab open, I also found John Doran's D16/M Minicomputer

Sun Dec 20, 2020 5:21 pm

Joined: Mon Oct 07, 2019 2:41 am
Posts: 331
Nice find to see similarities in an existing core, but yes 32 bits seems like a lot, especially for a discrete implementation. 24 bit turns out to be a good size though.

32 bits is possible if you can split your ALU into 2 16 bit segments. Toying with 32 bit computer design with a 20 bit address space, I plan to have the control and alu cards have ribbon cable running across the top of the cards for the control bus. A 100 pin card edge connector is for the memory, I/O bus. This is where using 2901's is nice, the datapath is easy to extend. The control section however gets more complex with byte,short,long data sizes and varied addressing modes.

Sun Dec 20, 2020 8:58 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123

Well, I would have a lot more fun building a discrete logic CPU than I would building it in an FPGA. And I think most other people would find a tangible CPU a lot more interesting. But I want an easier time on the software side, and 32-bits with a full GCC toolchain will go a long ways towards that.

How about this:

- rj16 stays a 16-bit computer, but it has a 32-bit instruction set architecture
- instead of random logic control, I move to microcode control (but still pipelined)
- most instructions take 2 cycles to calculate 16-bits at a time
- the programmer sees 16 32-bit registers, but really there's 32 16-bit registers
- all data busses are 16 bits
- the external address bus is 24-bits giving a nice 16 MB of memory
- move to a von neumann architecture, since fetches happen every other cycle
- instead of a funnel shifter, shifts are done up to 4 bits at a time in microcode
- multiply and divide are similar, running a microcode routine
- it will barely support virtual memory, with as much handled in software as possible
- it will have just supervisor and user modes

But the main goal of this processor is to easily port an OS to it. Second priority is simplicity, especially if it's built in discrete logic. Third priority is making it fast.

Mon Dec 21, 2020 1:18 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Patent and copyright laws make me nervous to use the M*Core architecture, and I don't even want to mention it or have any trace of it in my source code. I think the patents have all expired, but I believe freescale still licenses this CPU and wants to collect royalties. I don't feel comfortable using their ISA.

Plus, I have spent a couple days trying to figure out how to decode this instruction set, and it's riddled with weird exceptions that require special logic to detect. On top of that, the instruction set is designed to be as extensible as possible, devoting far too many bits to the opcode space and having really small immediates, especially for a 32-bit processor. In other words, I don't like it, it's going to add a lot of complexity to this project that I don't want.

The most flexible solution to this is to modify GCC's M*Core support to support my own ISA. But that's a lot of work I don't really want to do, and I would have to learn a lot about GCC. I would rather learn LLVM but that's even more work as my C++ is super rusty.

So I think what I am going to try to do is produce a binary translator from M*Core's instruction set to my own. Likely the easiest way to do this is to disassemble M*Core binaries into the assembly language my existing assembler uses. I can then use GCC to convert C and C++ code to object files and link them together into a binary, and there's just an extra step to disassemble that and reassemble it into a binary for my CPU.

So rj16 will likely end up with an ISA very similar to what I have already described above, I just need to add any instructions that are necessary to ensure the binary translator doesn't have to do anything super complicated like register re-allocation.

Wed Dec 23, 2020 4:01 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1702
As a general principle, there are so many ways to proceed that my advice would be to proceed in the direction which seems best to you - whether that's most fun, fastest progress, greatest novelty, or whatever.

Wed Dec 23, 2020 4:06 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So, an update:

It looks like the disassemble + reassembly route is going to work just fine.

I was thinking I might have trouble determining what is instructions and what is data in the .text section, but it turns out that is not so hard, and I have that much working already. GCC puts a pool of locally referenced literals between functions, but the symbol table notes where functions begin and end, so I know where the literal pools are. For long functions, sometimes GCC puts a literal pool in the middle of a function, but by noting all the references, I was able to figure out that one too.

Jump tables are still not working, but thankfully those can be turned off, so I won't worry about them yet. They do follow a pattern that could be parsed though, so it should be possible.

I also have most of the memory references properly labelled now so there's no absolute memory references. This lets the assembler calculate the addresses for everything, which makes it so I can replace rarely used instructions with a sequence of other instructions, and not have to implement them in my CPU. I am also free to increase the immediate sizes and inline constants if I want.

The trick to getting that to work reliably was to have the linker build a "relocatable" executable. In other words, it resolves all the symbols and merges references, but emits relocation tables rather than absolute memory addresses, making it possible to parse the relocation tables to know where all the references are and what they point to.

What's left is making sure the remaining references are labelled, a big refactor so I won't hate myself next time I need to look at the code, emitting the non-.text sections with labels, and some polish.

Sun Dec 27, 2020 7:13 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So, the disassembler is working as well as I can test it without being able to reassemble to verify the labels are all in the correct locations. I have all the sections fully labelled and being emitted.

The code is terrible, but I think I will treat this like a prototype and rewrite it at some point. Now that I know what I need I can make it a lot faster. But I will likely do that later and just try to record my learnings for then.

Just in case you're curious how it works, I will write that up here. Feel free to skip if this isn't your jam:

1. Compile the program and link as a relocatable executable (ld -r or --relocatable option)
2. Load the ELF file. Go has an ELF reader in the standard library, so that part was pretty much done for me.
3. ELF files are split into sections:
    - .text is the program proper, and readonly data (literals) if the linker is configured to combine .rodata with .text
    - .rodata is not always included but is the readonly data (literals)
    - .data is already initialized variables and their initial contents are stored in this section
    - .bss is any variables that aren't initialized, and takes up no space in the file
4. Parse the symbol table and build an ordered list per section, sorted by offset into the section
    - Each entry needs to know the offset into the section, its size and whether it's a function (code) or object (data)
4. Parse the relocation tables and build an ordered list per section of the referenced symbols, again sorted by offset into the section
    - Each entry needs to know the offset into the section where the reference happens
    - For some architectures you need to know what kind of reference it is so you know how to patch the instruction at that offset
    - You need to know which symbol from the prior list is being referenced, or copy some info from the symbol table like the name (label)
5. Some references in the relocation table are anonymous, and are just represented as offsets from the beginning of the section
    - These need to have labels generated for them and added to the list of symbols
    - References will need to have proper addresses assigned and point to the generated label
    - In-program literals, like string literals, are often what these refer to
6. You need to be able to disassemble instructions to a structure describing the operands
    - binutils has a crude disassembler meant for debugging the compiler more than anything, but can be a good initial reference
    - The programmers manual can be useful too, but I also found bugs in that
    - You need the operands so you can decipher/replace any addresses and pc-relative offsets
    - A function to pretty print the struct as an instruction with the replacement labels is useful
7. An initial pass through the code needs to be done to find any PC-relative references (esp branches & literal pool references)
    - Much like the anonymous references in the relocation tables, you need to generate labels for these
    - References need to be generated and added to the reference table for each instruction
    - Care to de-duplicate them is needed
8. A second pass through the code to emit the instructions
    - Any references to symbols must be replaced by labels in emitted code / literal blocks
    - I added a psuedoinstruction to the assembler to emit label addresses as data
    - Any symbols need to have a label generated just before them
    - This will include functions and variables
9. .rodata and .data segments should be emitted, similar to the .text segment but without any code
10. .bss is special, you need to just iterate over the symbols for the section without any data with it
    - customasm has "non-written" banks that aren't written to disk but you can reserve memory addresses in it with labels
    - The #res directive can reserve some bytes
    - For alignment there may need to be some filler bytes reserved between symbols
11. Polish output until it reassembles properly
    - You don't want any absolute or relative memory references left, you want everything labeled so you are free to transform the code
    - You may need to handle duplicate identifiers/symbols

All in all, not too hard, especially for a fix-size instruction set with literals stored separately. I would say sh-2 and msp430 would be other very good candidates for binary translation. msp430 is interesting as it's 16 bit. AVR might be okay, I am not super familiar with it.

Tue Dec 29, 2020 12:15 am

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Time for more pretty pictures :-)

Here's version 8 of the ISA, the preliminary 32-bit version (there might still be changes as I implement it):


Not sure if this is just gibberish or if it's easy enough to understand. But each row is an instruction format with each field laid out. Op0-3 are the opcode bit fields. The goal is to try to line up the sign bit of the immediates as best as possible (jp/jalp have an unsigned immediate so not as important). The next goal is to line up the registers. Now, something has to move around if the immediates and registers line up, so that's the opcode bits. Then finally you need to be able to uniquely identify what bits of the instruction are the opcode, so "prefix codes" are used -- which is just a pattern of bits that it's easy to decode using some logic.

Next, the opcode bits are fed through some multiplexers to build the internal opcode which is used to address a row in a big lookup table for the control signals. This diagram shows the pre-decode logic on the left, and the pre-decoded opcode on the right:


My goal is to try to minimize the size of the lookup table required for the control signals and/or microcode. That's mainly important for cheap FPGAs where there's only a few blockrams that you can use as ROMs for this lookup table. But it's also important if you plan to use programmable logic and/or CPLDs instead of great big ROMs.

Another thing that I think will help with minimizing the programmable logic, is to organize the opcodes semantically into a map, trying to maximize the number of input bits that produce very similar control signals, which I believe is the point of an opcode map. Each cell contains two instructions depending on the most significant bit of the opcode, then the next three bits form the row, and the least significant bits form the column. I colour coded the instruction formats, with a legend on the right in case the colours are hard for you to see:


Grey is unused or unusable opcode space. The nop in the top left corner is an internal nop, generated when the pipeline stalls and a bubble needs to be created.

So this is the process I use to figure out exactly what encoding each instruction will have. I renamed most of the instructions to protect the innocent, and because I prefer these names more. There's still far too many instructions so I may revisit this and see if I can cut it down even more, but for now I am going to run with this.

So, now that I know what the encodings look like I can move on to the assembler/simulator.

You do not have the required permissions to view the files attached to this post.

Thu Dec 31, 2020 12:41 am

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So, instead of trying to edit down something like 80 hours of video for the rj16 project, I decided to rebrand as rj32 and build the processor in Digital so it's more visual and hopefully a bit more engaging. Maybe I will eventually upload the rj16 content, but it's almost entirely programming... I have my doubts that people would want to watch it. But who knows.

So anyway here's the intro video:

Let me know what you think.

edit: updated video link to a newer, shorter intro video that better represents the series

Last edited by rj45 on Sat Jan 23, 2021 3:28 am, edited 1 time in total.

Sun Jan 03, 2021 12:46 am
 [ 237 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 16  Next

Who is online

Users browsing this forum: AhrefsBot, CCBot and 0 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software