Last visit was: Fri Jun 02, 2023 9:11 am
It is currently Fri Jun 02, 2023 9:11 am

 [ 38 posts ]  Go to page 1, 2, 3  Next
 Some Minimal Instruction Set CPUs 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
(From this old-school but contemporary page-of-links which is full of fascinating material - here's a quick description of the PDP-8 (pdf))

Three CPUs built with minimal hardware and therefore needing very small instruction sets:

Open 7400 Logic Competition: The BrainF* Machine by Alexis Bezverkhyy
This one is new to me: it has 10 instructions and is built to run assembled BrainF* programs. Just 17 chips, all 74-series apart from the UART, runs at 2MHz. Can print an ASCII Mandlebrot in a couple of minutes. Source package includes Life and Mandlebrot programs. You'll see the latter was written in a C-like macro language.
There's also a pointer to a C to BrainF* compiler.

From there, a link to PISC by Bradford J. Rodriguez
A Minimal TTL Processor for Architecture Exploration
uses 22 74-series chips including the useful but possibly hard to find '181 ALU and '172 register file.

Finally, Tiny CPU in a CPLD, by Steve Chamberlin, with a minimized 6502-like instruction set, also 2010/04/18/tiny-cpu-architecture

Interesting technology note on the 12-bit PDP-8:
At the time of its design, the most expensive components were the driver circuits for the magnetic core memory, and a driver circuit for an address line cost about the same as a driver circuit for a bit line. Thus, for a memory system containing 2N words of N bits per word (N=12 on the PDP-8) it was roughly the same cost to double the number of narrow memory words as it was to increase the width of memory by one bit! So, if you had a memory system containing 4K 12-bit words, adding one driver circuit gives you the choice between 4K 13-bit words or 8K 12-bit words.

Fri May 22, 2015 1:49 pm
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
Thanks, Ed. Here is a little MISC of my own. Six chips! Not merely a design exercise, this device added functionality to my customer's printing presses.
-- Jeff
This page describes a tiny computer made from an EPROM and a few logic chips. Although its specifications are ridiculously modest, the machine readily satisfied application requirements.

* Clock Rate: 60 Hertz
* Instruction Repertoire: 1
* Registers: 1 (a one-bit status Flag Bit)
* I/O-mapped memory (not memory-mapped I/O) — 2 bits

You do not have the required permissions to view the files attached to this post.


Sat May 23, 2015 2:02 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
It's a beaut! Feels like you could control a mechanical Turing Machine with this, so I think that makes it a fully qualified computer!

I'd be tempted to hook up an "Activity" LED to Q7.

I do like the subroutine mechanism - reminiscent of the one used by Sinclair in their scientific calculator. I was pleased to have speculated that they'd done what they had in fact done - so that's at least three of us who had the same thought there.

What was your approach to building the assembler?

Sat May 23, 2015 2:55 pm
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
The assembler was unremarkable -- written in Forth, but it was just a hasty hack. I had no incentive to make it fancy, as the Printing Press program was the only thing I ever assembled! This CPU was "one-application" as well as "one-bit"! :)

There are lots of comments here on Hackaday. A few folks thought this machine would be a brain-twister to program, but in fact it's mostly quite straight-forward. The assembler takes successive source lines and, by default, outputs the corresponding binary instructions to successive addresses, so things are not as helter-skelter as they may seem. You can ignore the fact every instruction is in fact a two-way branch.

I was completely caught off guard by how much attention this project got! :shock: Reddit and Hacker News also picked it up; in addition, there were dozens of tweets and retweets, and someone even translated my page into Spanish and reposted it. These ultra-simple CPUs really seem to intrigue and inspire people. Thanks for the new topic, Ed.

-- Jeff

ps- see also the Wikipedia article, One instruction set computer.


Sat May 23, 2015 4:14 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
Thanks for those extra pointers Jeff.

I'm going to indulge myself slightly and present Atari's vector drawing engine as found in the Asteroids video game - it's a very simple 6502 system, with simple timer-based sound generation, but it has an interesting coprocessor to drive the vector graphics to the monochrome XY tube. And that coprocessor is in TTL. It has drawing instructions and a subroutine mechanism, but isn't a general-purpose CPU. I think it's interesting anyway.

The primary source is the schematic, found here (pdf) but the relevant parts have been marshalled by Jed Margolin here (pdf). Much more accessible is a document here (pdf) written by Phil Pemberton. He says:
The Digital Vector Generator — or DVG — is a custom-designed CPU, built entirely from small-scale TTL ICs. It has an architecture totally unlike that of any CPU that existed at the time, and was designed specifically to drive vector-beam monitors. This was done because no CPU available at the time Asteroids was designed had enough power to manage game logic and draw vectors at the same time.
Offloading the task of drawing the display on to the DVG allows the CPU to be dedicated to the task of running the game logic.
The DVG features:
• 12-bit program counter, with 13-bit address space
• Four-level stack
• State machine microsequencer with eight micro-instructions
• Vector timer
• Two 12-bit binary rate multipliers to vary the timing relationship between the X and Y vectors
• 16-level brightness control
• 1024x1024 display resolution


The PC and stack are implemented with three 74LS670 register files and an up/down counter.


The coprocessor shares RAM and ROM with the 6502, with the CPU having priority. (The CPU also has private ROM and RAM. There's a hardware trick whereby pages 2 and 3 can be swapped, which allows the two sets of state for a two-player game to be accessed without indirection.)

You do not have the required permissions to view the files attached to this post.

Mon May 25, 2015 3:51 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
Quick pointer to a related discussion over on
where the question is
Dajgoro wrote:
What about a 4 bit cpu, with 16 instructions, with less than 100 gates, or fitting it in a XC9572?
Is it possible?

and among the answers, we have 8BIT:

and Arlet:
Well, it's a CPU on a XC9572. I haven't looked at the architecture.

Edit: MCPU is by Tim Böscke, and is now found at - it even fits in a 9536.

and Dajgoro reports some progress:
Here is the project, and what i managed to do so far:
I still haven't found a way to make a compact alu.

and MichaelM works on a serial ALU:
PS: fitted into a XC9536-5PC44, the speed estimate for the ALU increases from ~70 MHz to ~86 MHz.

source posted at

MichaelM has another contribution too:
Instead of designing a fixed architecture for the state machine, I have instead chosen to use a microprogram controller/sequencer as the basis of my programmable state machines. I have placed an example of a modified version of the Fairchild F9408 Microprogram Controller in my MAM65C02 GitHUB repository. I do tend to use the F9408 Microprogram Controller (MPC) because it provides most of the basic functions required for sequencing, branching (unconditional, conditional, and multi-way), and also supports micro-subroutines.

Finally, Arlet says
The ZPU is another very simple CPU. It's 32 bit, but only has 18 essential instructions, all 8 bit wide, and no operands because it's a stack based machine. In addition to the 18 essential instructions, there are also 25 additional instructions. These can be implemented in software, or in hardware, depending on the desired speed and hardware size. The cool part about this CPU is that it comes with a GCC port.

Finally finally, for lowdown on efficient implementations on FPGA, there's a recommendation to read Creating Embedded Microcontrollers
(Programmable State Machines)
by Ken Chapman: pdf here.

Tue May 26, 2015 1:17 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
On the subject of MISC, and explorations on small CPUs and Forth-like machines, see Ken Boak's new blog at ... -machines/

Sat Apr 15, 2017 6:31 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
Another mention needed of Ken Boak's blog, I think. He's developed a tiny Forth-like language called SIMPL which he can then use to explore CPU architectures by emulation. Here are his recent posts:

I have written a lot about SIMPL over the years, but it is my conviction that it has uses as a tool to help bootstrap various novel processors – and to ease the early stages of processor code development. It’s not a fully fledged interpreted language like Forth or BASIC but just enough to make writing code easier on unfamiliar processors.

Thu Apr 27, 2017 4:03 pm

Joined: Mon Aug 14, 2017 8:23 am
Posts: 157

Thanks for the shout out regarding my blog - much appreciated.

SIMPL is not much more than a case statement running within a loop - and was inspired by Ward Cunningham's Txtzyme

It decodes single ascii characters within the case statement , and this allows several unique operations to be performed just by typing a character at the serial terminal.

Txtzyme was limited to just 13 operations, such as setting a port pin, taking an ADC reading, printing an integer number or a text string and running loops. I quickly realised that I could extend the instruction set and make it a lot more useful.

It has been implemented on Arduino (AVR), MSP430 and ARM.

It occurred to me that the character decode could be used as the basis of simulating a CPU, and one of my first attempts was James Bowmans J1 Forth CPU.

There is a strong resemblance between his verilog code for the instruction decoder, and my instruction decoding case statements - which is now fairly obvious, now that I am embarking on verilog programming.

Whilst James's J1a design does not quite qualify as a OPC design - I find his verilog style very clear, which as a beginner, I find important. ... rilog/j1.v

James went on to implement the J1a on a Lattice ICE 40 - with 1K logic elements - and there's a youtube of him demonstrating this.

Recently I have been involved with a mate in producing a low cost FPGA board, BlackIce - which also is an ICE 40 - but "4K" logic elements.

I put this in quotes, because it's actually an 8K die - and it's only Lattice's proprietary programming software that knobbles half of the chip.

When programmed with Clifford Wolf's IceStorm open source tool chain - it literally blossoms into a fully functioning 8K device.

I'm looking forward to contributing to the OPC project - even if my contribution is in the form of open source FPGA dev boards



Mon Aug 14, 2017 1:11 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
Great to see you here Ken. BlackIce looks very attractive, with the fast 16 bit wide SRAM. (And getting double the FPGA capacity is a win too.)

At the beginning of this year I was mainly looking for dev boards with 32 bit wide memory, but since the OPC idea took off, 16 bits seems just right! (In fact, it even turned out that running OPC6 with a bytewide memory and a tiny code cache works pretty well - it's 16 bits on the inside, 8 bits on the outside.)

Word-addressed machines appeal to me at present - they are simpler than byte-addressable, naturally have a bandwidth advantage over machines with only byte-wide memory, and while they might lose out a little when working on byte-sized data such as strings or opcodes in some interpreted language, they should win when handling pointers or integers or floats. Also they have some historical interest - before bytes were invented, words were all there was.

So, EDSAC, J1, and OPC all seem to be in the same space, in that way at least.

And, a word-wide opcode has plenty of bits to make instruction decoding simpler.

Mon Aug 14, 2017 2:17 pm

Joined: Mon Aug 14, 2017 8:23 am
Posts: 157
Hi Ed,

When my friend Alan Wood and I specified the BlackIce design, I was influenced by James's J1 - and word addressing appeared the obvious choice. The word-wide 10nS SRAM was only a couple of dollars, so it seemed sensible to include it on the back of the pcb - very closely coupled to the FPGA.

Whilst there are about 50 of the myStorm boards in circulation - I don't think anyone has used the SRAM in earnest yet.

There have been a few designs, notably those of Chuck Moore, that pack several 4 or 5 bit opcodes into a 16-bit word and execute them in sequence. This creates a pseudo-pipeline which speeds up the throughput, but also allows the use of slower RAMs.

Matthias Koch of MeCrisp Forth has got the J1 running on the myStorm board - and a version of MeCrisp that runs on the ARM cpu.

My ultimate aim was to understand which instructions are essential for an interpreter like SIMPL, and try to implement these efficiently. So I coded SIMPL up in very crude MSP430 assembly language - so I have a feel for what is needed to implement it - at least on a 16-bit harvard cpu with a reasonable amount of registers. As it happens I only use about 6 registers. However there's quite a lot of transferring data between those 6 registers - and so a hybrid machine that has both a stack architecture but allows efficient register to register instructions might be an advantage.

On OCP 6 I like the means of being able to modify the registers by a short integer - including the PC which gives short jumps. I'd be tempted to extend this idea to allow an 8-bit "payload" - so that efficient jump tables can be created, and an instruction like DJNZ would also be useful for implementing efficient loops.

A register file in on chip RAM could be used to create a parameter stack, for a true stack machine - but as SIMPL generally only uses 2 operands, even a full stack is possibly somewhat redundant.

I'll try to come up with a proper description of SIMPL to share with the forum. I've tried several times to describe it, but it's constantly evolving, and finding new applications. I've added a summary below, but the full code (MSP430 ASM) is on github ... asm_15.asm

I think that it's best to think of it as a toolkit to help exercise experimental hardware, with an instruction set that is highly mnemonic - so that you can virtually write directly in the machine language of the processor - and still read the code.

; SIMPL - a very small Forth Inspired Extensible Language
; Implementing the Initialisation, TextTead, TextEval and UART routines in MSP430 assembly language
; A Forth-Like Language in under 1024 bytes

; Ken Boak May June 2017

; Loops, I/O, Strings and Delays added
; Jump table reduced by 36 entries (72 bytes)
; times_32 subroutine further reduces codesize

; This version 860 bytes 9600 baud communications with 1MHz DCO

; Input and output to port P2 of Launchpad added with "i" and "o" commands

; SIMPL_430ASM_16

; Primitive Instructions - all have a fairly mnemonic ascii character that is the machine instruction!

; These allow basic maths an logical instructions on 16-bit integers + - / * & | ^ ~
; Stack Manipulation DUP DROP PUSH POP SWAP OVER " ' , . $ %
; Memory transfers with FETCH and STORE @ !
; Compilation mode with : and ;
; Simple decrementing loops (..........)
; Input and Output
; Print a string _Hello World_

; Note as of 13/06-2017 - not all of these are fully implemented

; ADD +
; SUB -
; SHR /
; SHL *
; AND &
; OR |
; XOR ^
; INV ~
; DUP “
; DROP `
; PUSH ,
; POP ‘
; SWAP $
; OVER %
; CALL :
; JMP \
; JE =
; JGT >
; JLT <
; TO-R {
; FROM-R }
; LOOP-Strt (
; LOOP-End )
; IN [
; OUT ]
; KEY ?
; NOP Space
; LIT #

; Lower case letters are used for more complex commands

;h set port pin high
;i input byte from port
;k access the loop counter variable
;l set port pin low
;m milliseconds delay
;o output byte to port
;p print the to of stack to terminal
;q print the ascii character at given RAM location
;r read input pin
;s sample the ADC
;u microseconds delay

; Upper case letters are used to define Users "words"

; User Routines are defined by capital letters starting with colon : and end with semicolon ;

; eg :F10(100mh200ml); ; Flash the led 10 times - high for 100mS and low for 200mS

; You can play sequences of notes through a small speaker ABC etc

; :A40{h1106ul1106u); musical note A
; :B5{h986ul986u); musical note B
; :C51{h929ul929u); musical note C
; :D57{h825ul825u); musical note D
; :E64{h733ul733u); musical note E
; :F72{h690ul691u); musical note F
; :G81{h613ul613u); musical note G
; :H_Hello World, and welcome to SIMPL_; A Banner Message


; Examples of SIMPL phrases

; eg add 123 and 456 and print the result to the terminal

; 123 456+p

; Loop 10 times printing "Spurs are Fab!"

; 10(_Spurs are Fab!_)

; Flash a LED 10 times 100mS on 200mS off

; 10(h100ml200m)

; Toggle a port pin at 1MHz 1000(hlhlhlhlhlhlhlhlhlhl)

; That's all folks!

Mon Aug 14, 2017 3:39 pm

Joined: Tue Dec 11, 2012 8:03 am
Posts: 285
Location: California
monsonite, you can put [code] and [/code] around your code to make the forum software preserve your white space and make the section monospaced. Then you can get for example:

;   DUP       “
;   DROP      `
;   PUSH      ,
;   POP       ‘
;   SWAP      $
;   OVER      %
;   FETCH     @
;   STORE     !
;   CALL      :
;   RETURN    ;
;   JMP       \
;   JE        =

It also won't try to translate character combinations into emojis like it did in your line

; :D57{h825ul825u); musical note D

Instead, you'll get

; :D57{h825ul825u); musical note D

_________________ lots of 6502 resources

Mon Aug 14, 2017 7:23 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
The inc and dec commands were fairly late additions to the OPC6 - it's certainly tempting to put short constants into the instruction word, but of course it makes decode a bit more difficult. In fact I think we can see a slight gradual decline in the speed of the machine as we added features and the logic became more complex. It's also true for a one-page machine that the code becomes increasingly dense - hopefully still not yet to the point of being obfuscated.

It's an interesting practical question though, what a multi-way branch would look like with the present machine, and how it might be improved if that seemed like a good idea.

It's also interesting though that the machine was an adequate computer - Turing complete - from the outset. In a sense, everything we've added has been unnecessary, but shifting the tradeoff away from machine simplicity and towards higher performance and better code density.

Mon Aug 14, 2017 7:31 pm

Joined: Tue Dec 18, 2018 11:25 am
Posts: 42
Location: Hampshire, UK.
An 8 bit MISC implemented in a XC9536XL CPLD, for a data acquisition application.
The VHDL implementation of an 8-bit minimal CPU described by Boscke (2002) was adopted as the basis for a very simple processor with a tiny instruction set.

The CPU has only three instructions:-
In the initial design, the instructions were encoded using two bits, which made the maximum size of the instruction set to be four. However, only three instructions were actually required, namely, RD: read 8-bit word from the ADC and store it in data register, WR: send the value in data register (DR) to the memory module, and JP: set programme counter (PC) to zero.

And the short application code is inside the CPLD as well:-
The implementation of the CPU required only a small program memory. In the initial design, the size of the memory was limited to four two-bit words. The incorporation of memory inside reduces the external device count and increases the speed of program execution. Although this constitutes to hard-wiring the program in to the CPU, there is no loss of flexibility because the CPLD can be easily reconfigured.

And takes advantage of the JTAG boundary scan support available in the XC9536XL for transferring data to a PC:-
This data acquisition system can function as a standalone device and record data until the memory becomes full. For transferring the saved data to a PC, the JTAG boundary scan (IEEE standard 1149) [7] available in the XC9536XL was used. The main purpose of JTAG boundary scan is to provide a convenient method of diagnosing problems in complex circuits. Chips that support this standard allow the isolation of the core of the chip from its pins and setting values to output pins and reading the status of input pins using commands sent to the chip through the JTAG port.
Making use of this facility, the data transfer to the PC was carried out by connecting the JTAG port to the parallel port of a PC using the same cable utilized for configuring the CPLD. Each word stored in the 6116 SRAM was read to the PC using the boundary scan commands.

Mon Jan 13, 2020 2:55 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
That's so minimal! Their instruction set contains 3 instructions, as does their program in ROM. (The already-small CPU has been cut down so much, it's become a fancy implementation of a finite state machine.)

But the reference to
an 8-bit minimal CPU described by Boscke (2002)

could be very interesting. Here's a version of the paper
MCPU - A Minimal 8Bit CPU in a 32 Macrocell CPLD by Tim Böscke (2001, revised 2004)
as found in the repository
which links to a minimal emulator and compiler for the machine:
The MCPU project now lives on GitHub, including a 6-page PDF. But this post is not about duplicating his design, or simulating it at the logic level - instead, I’d like to show how to create an assembler and emulator for this CPU with only a few dozen lines of code. At the end, we’ll use this to run a little prime-number generator written in MCPU-assembler, and then re-establish the earth-shattering fact that the 52nd prime is 239!

Mon Jan 13, 2020 4:51 pm
 [ 38 posts ]  Go to page 1, 2, 3  Next

Who is online

Users browsing this forum: No registered users and 0 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software