Last visit was: Fri Jun 13, 2025 1:22 pm
It is currently Fri Jun 13, 2025 1:22 pm



 [ 17 posts ]  Go to page 1, 2  Next
 Tridora-CPU - an FPGA stack machine CPU for Pascal 
Author Message

Joined: Sun Oct 20, 2024 9:43 pm
Posts: 7
Hi everyone,

this is my first post here and I would like to present the project I have been working on for the last three years.

The Tridora-CPU: A homebrew CPU written in Verilog with a stack machine architecture designed for the Pascal compiler that I also
wrote for it.

The CPU runs together with some other I/O-controllers on an Arty-A7 board at 83 MHz. It uses a serial console via USB
and also has a VGA output to display graphics.

The Pascal compiler runs natively on the machine and there is an editor and a minimal operating system running from
a micro-SD-card so you can develop programs directly on the board.

This is my first Verilog/HDL project, so it is probably full of bugs and design errors, but at least it runs stable for now.

Source code, some documentation and a couple of demo videos are available here:

https://gitlab.com/slederer/Tridora-CPU

An emulator is also available.

Currently, the only supported board is the Arty-A7-35T which is no longer in production. It should be easily
adaptable to the Arty-A7-100T, but I am also looking into the Nexys 4 and the Olimex GateMateA1-EVB.

This has been such a fascinating project for me to create a complete computer system from the ground up. Creating
the CPU logic and see it slowly coming to life, then creating all the different bits of software to make it work, first
in assembly language, then in a compiled language, is such a unique experience that you cannot get from modern computers.

It was also quite fascinating to see, when writing the compiler, how well the stack machine architecture and the workings of the compiler (of the good old recursive-descent-parser type) fit together.

So it is also a kind of a retro computing experience.
The system has features from the 8-bit, the 16-bit and the early 32-bit eras, and it
might remind you of the UCSD-P-System and early Turbo-Pascal versions.

Other inspirations were, among others, in no particular order:

- the Novix 4016 CPU (a stack machine CPU designed for Forth, mainly by Charles Moore)
- the J1 CPU by James Bowman (which is not entirely unlike the Novix 4016)
- the Lilith computer by Niklaus Wirth and his team (a stack CPU designed for Modula-2)
- the PERQ workstation (also a stack CPU designed for Pascal)
- the Magic-1 by Bill Buzbee
- the OPC by revaldinho

The source code for the CPU is rather small (~500 lines of Verilog for the CPU core), and because of the stack machine architecture,
the compiler was also easy to write, resulting in about 9000 lines of code.

I tried to keep everything as simple and readable as possible, so that I could understand my own code a few months later.
I also hope that it might help other people to see how a compiler works, and how it fits with the CPU design.

Let me know what you think! Feel free to point out my design mistakes, bugs, or other suggestions.


You do not have the required permissions to view the files attached to this post.


Tue Oct 22, 2024 10:09 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1832
Welcome, and what an excellent first post! That looks great - and thanks for open sourcing and sharing the code.

Most impressed to see you edit, build and run on the machine itself. The video was very informative - what's next for you? Perhaps a linker? What do you have by way of tracing, debugging, profiling? I imagine it's been quite a journey with lots of thinking time. But I note you do have an emulator so that would have been very useful.


Wed Oct 23, 2024 6:27 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2357
Location: Canada
Wow! Impressive. Welcome.

_________________
Robert Finch http://www.finitron.ca


Wed Oct 23, 2024 2:52 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1832
(I see this project has been written up on Hackaday - congratulations!)


Wed Oct 23, 2024 9:32 pm
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 92
Location: Michigan USA
This project is something else! I especially appreciate the quality of the documentation. I see you went with 3 stacks. I am partial to that architecture. Did you start out with that plan, or did it develop over time?

Attachment:
stacks.png


You do not have the required permissions to view the files attached to this post.


Wed Oct 23, 2024 11:24 pm WWW

Joined: Sun Oct 20, 2024 9:43 pm
Posts: 7
BigEd wrote:
Welcome, and what an excellent first post! That looks great - and thanks for open sourcing and sharing the code.

Most impressed to see you edit, build and run on the machine itself. The video was very informative - what's next for you? Perhaps a linker? What do you have by way of tracing, debugging, profiling? I imagine it's been quite a journey with lots of thinking time. But I note you do have an emulator so that would have been very useful.


Thanks for your feedback!

It was indeed a long journey. Debugging the software part was mostly done in the emulator, and there is an earlier incarnation of the emulator that is written in Python which has lots of debugging features like single-stepping, stack-tracing and heap-tracing. The old emulator was very slow and did not support the VGA output, so I decided to write a new one which is fast and shiny but has none of the debugging features.

Debugging the hardware was done with the built-in logic analyzer in Vivado. At some point I also used colored squares from the VGA signal generator to display internal register bits of the CPU while running at 1Hz.

Next is probably the move to a different FPGA board since the Arty-A7-35T is no longer available. I have a Nexys-A7 lying around here but I am not looking forward to fighting with the DRAM interface again ;)

On the software side, a linker would definitely nice to bring down the turnaround times. It will probably be something very simplistic just to import a precompiled image of the standard library at fixed addresses.

And I guess along the way I will find lots of bugs in the compiler and the runtime to fix, as it has been for the last year :)


Thu Oct 24, 2024 12:02 am

Joined: Sun Oct 20, 2024 9:43 pm
Posts: 7
mmruzek wrote:
This project is something else! I especially appreciate the quality of the documentation. I see you went with 3 stacks. I am partial to that architecture. Did you start out with that plan, or did it develop over time?

Attachment:
stacks.png


Thanks!

I did not really have it as a goal from the start, but the three-stacks architecture was decided very early on. I'd say it comes naturally, if you want a simple design for the CPU internals, to have separate stacks for the different functions.

My first vague ideas came from the UCSD-P-System, which has one stack, and then saw the J1 CPU which has two (and is super-optimized for Forth).

So I went for something like the simple and clean design of the J1, took the evaluation stack for fast access and for easy address calculation the return stack. Then I needed another stack for the local variables of compiled programs, and so I ended up with three stacks.

Come to think of it, I guess there are (or have been) some commercially available CPUs with two stacks (e.g. the RTX2000), but no ones with three stacks. Is this something you have seen somewhere else?


Thu Oct 24, 2024 1:05 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2357
Location: Canada
Quote:
Come to think of it, I guess there are (or have been) some commercially available CPUs with two stacks (e.g. the RTX2000), but no ones with three stacks. Is this something you have seen somewhere else?

There is the 4stack machine which uses multiple stacks, though I am not sure how.

https://bernd-paysan.de/4stack.html

The return stack could maybe be in dedicated BRAM. It is not very often that returns more than a few dozen deep are required. With a single block RAM it could go 1024 deep.

_________________
Robert Finch http://www.finitron.ca


Fri Oct 25, 2024 5:47 am WWW
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 92
Location: Michigan USA
In his book "Stack Computers: The New Wave" Koopman created a taxonomy for stack machines. Here are 2 excerpts from the book explaining his organization, and an interesting table of historical examples.

The LALU computer that I have described on anycpu (in a separate thread) has 3 stacks: Data, Return and Keyboard. The Keyboard stack holds the user-input text stream, which is analyzed by the NEXT engine of the interpreter.


You do not have the required permissions to view the files attached to this post.


Fri Oct 25, 2024 12:16 pm WWW

Joined: Sun Oct 20, 2024 9:43 pm
Posts: 7
The Tridora-CPU has gotten an update!

First of all, I reduced the clock speed from 83MHz to 77Mhz. I encountered seemingly random timing problems whenever I changed something, so I guess the logic design was at its speed limits.

At 77MHz, those problems went away and I was able to add something: An instruction cache.

It is very simplicistic: It just caches the 16 bytes that the DRAM controller delivers anyway. When the next instruction fetch address is inside the same 16-byte range, the instruction word is taken from that cached 16 bytes. Actually, the CPU always reads a 32-bit word for an instruction fetch and then throws one half away to get a 16-bit instruction. That of course makes it even more important to have an instruction cache.

So in the optimal case, if you execute 8 instructions out of DRAM, they start at a 16-byte boundary and no memory transfers are done for data, these 8 instructions will take 53 clock cycles instead of 200.

I have created some simple benchmarks written in Pascal and the improvement for some of them is about factor two. For example, empty loop, reading an integer variable, adding integers and string indexing. Others show very little improvement because they mostly call subroutines which reside in the lower 64KB of RAM which is BRAM/SRAM without latency (e.g. integer multiplication and division).

For reference, the benchmark program is here: https://gitlab.com/slederer/Tridora-CPU/-/blob/main/examples/benchmarks.pas
The results are here: https://gitlab.com/slederer/Tridora-CPU/-/blob/main/examples/benchmarks.results.text

Many normal programs are not noticably faster though, because they fit inside the SRAM area anyway. For the benchmark to show any difference, I need to hack the assembly code to insert a large block of null bytes, so that the code ends up above the 64KB boundary.

What I also found interesting in the benchmark results is that floating point multiplication is a tiny bit faster than integer multiplication. Both are done with assembly language subroutines, and the algorithm is essentially the same. But my floating point implementation uses only 23 bits for the fraction, so it has to do less loop iterations than for 32-bit integers. Add some instructions for unpacking/normalizing/packing, and you end up with still being a bit faster than integer multiply. Compare that with additions where integer addition is 12 times faster than float addition (executing in SRAM).

There have also been some bug fixes for the compiler and assembler, making it possible to compile larger and more complex programs. So now I have ECL-Rogue running on the Tridora, which is a Pascal variant of Rogue that was written on the PDP-10. Porting it to FreePascal and Tridora-Pascal is a story of its own ;)

Here is a video of me playing ECL-Rogue on the Tridora-CPU: https://youtu.be/6yK7TejsFas

Next steps will probably be to decide if I want to do some more 2D software sprite graphics or rather go to 3D graphics, and to choose another FPGA board now that the Arty-A7-35T is no longer in production.


You do not have the required permissions to view the files attached to this post.


Sun May 25, 2025 10:16 pm

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
Next steps will probably be to decide if I want to do some more 2D software sprite graphics or rather go to 3D graphics, and to choose another FPGA board now that the Arty-A7-35T is no longer in production.

Check ebay for a used Arty-A7-35T , before you move to a newer FPA, Every new FPGA will have it's own can of worms.
Is there ample room on the FPGA to have a ALGOL system rather than PASCAL?


Mon May 26, 2025 2:41 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1832
Great update, thanks Sebastian!


Mon May 26, 2025 6:35 am

Joined: Sun Oct 20, 2024 9:43 pm
Posts: 7
oldben wrote:
Check ebay for a used Arty-A7-35T , before you move to a newer FPA, Every new FPGA will have it's own can of worms.

Yeah, I noticed :) I already tried some low-cost board which seemed nice and costs 1/6 of the Arty-A7-100, but I got nothing working and the lack of good documentation and examples was too much for me. So I shelved that.

oldben wrote:
Is there ample room on the FPGA to have a ALGOL system rather than PASCAL?


I have never seen an ALGOL compiler (neither 60 or 68), but I believe it should be possible. The CPU has 32 bits of address space, supports large stack frames and has no limits on the return stack, so there should be no problems there. The Arty board has 256MB of DRAM, so that should be enough for ALGOL, Modula-3 or Ada.


Mon May 26, 2025 4:29 pm

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
What do you have for mass storage?
Paper tape does work, if you have a PDP-8.
http://pdp8.de/download/RogAlgol.pdf


Tue May 27, 2025 2:22 am

Joined: Sun Oct 20, 2024 9:43 pm
Posts: 7
oldben wrote:
What do you have for mass storage?

The Tridora uses a microSD card with a simple flat filesystem. Theoretical file size limit is 2GB because of signed 32-bit integers.

oldben wrote:
Paper tape does work, if you have a PDP-8.
http://pdp8.de/download/RogAlgol.pdf


That is so cool. An Algol-60 compiler for the 12-bit PDP-8, using a virtual machine, in 8k (presumably 8k words) of RAM. The author was at the Department of Zoology in Oxford? And of course he had to emulate a stack, because the PDP-8 did not have one, like many smaller systems at that time.


Tue May 27, 2025 10:23 pm
 [ 17 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: claudebot, SemrushBot and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software