While the CPU is a classic 16-bit RISC I want to add some support to increase the memory space to 32-bits,
viewtopic.php?f=3&t=989 discusses this a bit. The idea is to use a VM to emulate a 32-bit machine. There may be some hardware and instruction support to help make this more efficient. I think it should be possible to match clock for clock performance with a 68k.
Current WIP CPU features:-16-bit RISC 2-stage barrel processor with 32-bit instructions. Getting a RISC ISA to fit in 16-bits is stressful and not recommended.
-Harvard architecture. Instruction memory is 256k x 32-bits and write only. Data is 1024k x 16-bits. Both are fast 10ns SRAM.
-Pipeline:
Code:
Stage 1, phase 1: writeback / instruction fetch*
phase 2: operand fetch / instruction decode
Stage 2, phase 1: ALU or effective address / zero and negative detect for branches / pixel shift and LUT / shifter
phase 2: Data memory / zero and negative detect for comparisons / PC increment or branch displacement or load / instruction fetch* / shifter
*instruction fetch and branching is in a state of flux. If decode can't be done in time instruction fetch can be slotted into stage two..
-Clock cycle worst case should be under 100ns
-For shifts I'd like to do a 32-bit shift in 2 cycles. My first thought was a funnel shifter, but that may be too big/slow.
-The register file is made is made of single port SRAM wired up for 3R/1W in a write through mode. The extra read is for 32-bit addressing.
-31 registers plus zero. All writes to R0 are zeroed out instead of doing it read side as the hardware can be reused for set on condition instructions.
-Registers are 32-bit wide(16-bit physical) but split into two 16-bit halves. For 16-bit ops I think this will be treated as 64 registers.
-Although 32-bit writes are possible for storing the PC, the upper 16-bit word isn't shadowed to the other banks. So will require moving it to itself, which will copy it to all banks.
-There's plenty of space for multiple register files so there will be an instruction to switch register file.
-Initially branches were going to be similar to Alpha and branch on comparison with zero. However moving the instruction fetch to the next stage means there is enough time for compare and branch instead.
-Compare and set instructions will be available
-Unconditional jumps will be rolled into Jump and Link, and Jump and Link Register instructions. JAL will probably be PC relative. The link register can be any register.
-External DRAM. Access time will probably be two cycles unless some kind of forwarding is implemented.
-It would be nice for the VM to have support for virtual memory, but I haven't come up with a performant way of dealing with this.