View unanswered posts | View active topics It is currently Sat Feb 17, 2018 7:33 pm



Reply to topic  [ 24 posts ]  Go to page Previous  1, 2
 TOYF (Toy Forth) processor 
Author Message

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
monsonite wrote:
Hugh,

The Lattice ICE40 range of FPGAs are becoming popular - as a result of an open-source tool chain called Project IceStorm.

There are several development boards that have recently become available - as a direct result of the emergence of the open source tools.

The ICE40HX4K part is really a 7680 "8K" logic element die - that was artificially disabled to 4K by the Lattice proprietary toolchain.

They are not the biggest or fastest FPGAs - but they are low cost and ideal for implementing 8/16 bit cpus - up to about 40MHz usable clock frequency.

Dave Banks (Hoglet67) has successfully implemented 6502, Z80 and OPC 6 processors on this device - plus complete machines including Acorn Atom, BBC Model B, CP/M machine and Jupiter Ace.

The OPC6 processor used about 20% of 960 the available logic blocks.

The BBC Model B computer was based on Arlet's verilog 6502 implementation - using 144 of the 960 blocks for the 6502 cpu.

https://github.com/Arlet/verilog-6502


The complete machine with video generator etc used about 85% of the logic blocks - https://forum.mystorm.uk/t/bbc-model-b- ... ice/258/56

The MiniForth came out in 1995 on the Lattice isp1048 PLD --- this was not an FPGA --- since that time the MiniForth has been renamed RACE and implemented on a Lattice FPGA for a several-fold increase in speed.

monsonite wrote:
Speaking of Forth - you might wish to look at James Bowman's J1 Forth processor - which has also been ported to the Lattice ICE40

https://github.com/jamesbowman/j1

I'm aware of the J1 --- this is pretty primitive --- it doesn't have local variables.
Bernd Paysan's B16 and the RTX-2000 also lack local variables.


Sun Oct 29, 2017 1:58 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
monsonite wrote:
The Lattice ICE40 range of FPGAs are becoming popular - as a result of an open-source tool chain called Project IceStorm.

There are several development boards that have recently become available - as a direct result of the emergence of the open source tools.

The ICE40HX4K part is really a 7680 "8K" logic element die - that was artificially disabled to 4K by the Lattice proprietary toolchain.

They are not the biggest or fastest FPGAs - but they are low cost and ideal for implementing 8/16 bit cpus - up to about 40MHz usable clock frequency.

I have a major update on my TOYF design (attached).

I now have a CX register. It is used for the counter in loops, or the node in a loop traversing a linked-list.
Also, CX and DX have to be saved/restored by the I/O polling code --- they are expected to survive POL.

I made several other changes. I fixed the multiply so it would work (there was a bug previously) and changed the design so it is now 1 clock cycle per bit, which is 16 total.
At 40 Mhz., this is less than 1/2 microsecond --- that should be adequate for a PID congtroller.

thanks for your interest --- Hugh


Attachments:
File comment: major upgrade --- CX register for loops --- multiply is 1 clock cycle per bit
toyf.txt [45.91 KiB]
Downloaded 11 times
Mon Oct 30, 2017 4:14 am
Profile

Joined: Tue Dec 31, 2013 2:01 am
Posts: 95
Location: Sacramento, CA, United States
I don't fully grok all that you have shared yet, but I think I noticed that you sometimes hold the recently discarded TOS in DX. If my limited understanding is correct, would it be a) "easy" b) "difficult" c) "impossible" or d) "WTH are you talking about Mike?" to keep TOS in DX most of the time, especially between primitives?

Mike B.


Mon Oct 30, 2017 6:32 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
barrym95838 wrote:
I don't fully grok all that you have shared yet, but I think I noticed that you sometimes hold the recently discarded TOS in DX. If my limited understanding is correct, would it be a) "easy" b) "difficult" c) "impossible" or d) "WTH are you talking about Mike?" to keep TOS in DX most of the time, especially between primitives?

Mike B.

TOS is in BX all of the time.

In some cases, a primitive will leave data in DX rather than push it onto the stack.
So, temporarily, TOS (top-of-stack) is in DX and the BX that is normally TOS is now SOS (second-of-stack).

For example:
Code:
: foo  9 + ;

Without optimization, this compiles as:
Code:
lit9            ; pushes BX to the stack in memory, loads 9 into BX
plus           ; pulls the SOS from the stack in memory to a register, adds it to BX
exit

With optimization, this compiles as:
Code:
lit9_dx       ; loads 9 into DX
fast_plus    ; adds DX to BX

So, the compiler has to be smart enough to compile code that uses DX when it can.
This shouldn't be too difficult. It can be a traditional single-pass Forth compiler.
This is just peephole optimization. The compiler first compiles LIT9. Then when it finds the next thing to compile is PLUS it backs up and uncompiles the LIT9, compiles LIT9_DX instead, then compiles FAST_PLUS rather than PLUS.

VFX has an "analytic compiler." It compiles everything into a data-structure, then analyzes the data-structure and generates code from that in a second pass. It likely makes several passes over the data-structure.

TOYF doesn't need an analytic-compiler. We only have one free register, which is DX. A simple single-pass compiler with peephole optimization should be adequate to generate reasonably good code.

I have never written an analytic-compiler. I have read about this and I think I know the basic idea, but don't have any experience. I don't want to delve into figuring out an analytic-compiler right now. The TOYF is a "Toy Forth," so I'm planning on a pretty simple straight-forward compiler. If I ever decide to write a Forth compiler for the ARM Cortex or the dsPIC I will have to write an analytic-compiler in order to take advantage of all those registers. That would be a lot of work! I am avoiding all that work by inventing my own processor that doesn't require me to learn anything new (the TOYF is pretty similar to the MiniForth/RACE that I have experience on).

thanks for your interest --- Hugh


Mon Oct 30, 2017 5:13 pm
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
monsonite wrote:
The BBC Model B computer was based on Arlet's verilog 6502 implementation - using 144 of the 960 blocks for the 6502 cpu.

https://github.com/Arlet/verilog-6502

The complete machine with video generator etc used about 85% of the logic blocks - https://forum.mystorm.uk/t/bbc-model-b- ... ice/258/56

What is a "video generator"? Is that something similar to the Vic-II chip used in the venerable C64? You get double-buffered screens and sprites?

I wonder if my TOYF could become a game machine --- that would fit in well with the "Toy Forth" name. ;-)

The major weakness of the TOYF is lack of interrupts. The POLL code is executed every time that POL is used to end a primitive (rather than NXT that just ends the primitive but doesn't poll the I/O).
A game machine doesn't have a lot of I/O though.

1.) It needs to watch the clock so it can run the game at a smooth speed. This is pretty pedestrian though --- a heartbeat of maybe 100 milliseconds --- changing the screen much faster than this will just be a blur for the human viewer.

2.) It needs to poll the input device, which is likely a joystick. This is pretty pedestrian though --- the human can't change directions very quickly

So, the lack of fast I/O support shouldn't be a problem. :-)

edit: fix typo


Wed Nov 01, 2017 2:01 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
Hugh Aguilar wrote:
I wonder if my TOYF could become a game machine --- that would fit in well with the "Toy Forth" name. ;-)

The major weakness of the TOYF is lack of interrupts. The POLL code is executed every time that POL is used to end a primitive (rather than NXT that just ends the primitive but doesn't poll the I/O).
A game machine doesn't have a lot of I/O though.

1.) It needs to watch the clock so it can run the game at a smooth speed. This is pretty pedestrian though --- a heartbeat of maybe 100 milliseconds --- changing the screen much faster than this will just be a blur for the human viewer.

2.) It needs to poll the input device, which is likely a joystick. This is pretty pedestrian though --- the human can't change directions very quickly

So, the lack of fast I/O support shouldn't be a problem. :-)

Actually, the TOYF won't work well in a game machine. We need high-speed interrupts for playing music. This would only be realistic if the 65ISR-chico was used as a coprocessor. The TOYF main-program could upload a music score (described in some easy-to-interpret code) to the coprocessor and it would actually play the music.

Anyway, I have a new design. :-)
I added support for division.
I also added support for linked-lists and wrote a lot of code to support linked-lists --- this looks like it should be efficient --- I used linked-lists as my standard data-structure in the novice package and intend to do so here also.

I can really start on the assembler/simulator now --- I don't think there is any more that can be done to the design.

thanks for your interest --- Hugh


Attachments:
File comment: support for division --- support for linked-lists
toyf.txt [58.96 KiB]
Downloaded 12 times
Wed Nov 08, 2017 2:07 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
Hugh Aguilar wrote:
Anyway, I have a new design. :-)

I have a very minor upgrade. I just got rid of the ISR instruction and added the CNX instruction --- this saves one clock cycle inside of the POLL code.


Attachments:
File comment: got rid of ISR and added CNX --- very minor upgrade
toyf.txt [58.71 KiB]
Downloaded 7 times
Sat Dec 16, 2017 3:43 am
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
I have another upgrade (attached).

I added an EX register, and I also upgraded my 16x16 multiplication to generate a 32-bit product --- previously my 16x16 multiplication only generated a 16-bit product.
I had not done this in the past because I was concerned that adding another register would cause the TOYF to require a bigger and more expensive FPGA. Now I decided to do this anyway. The full multiplication is pretty useful.

I still have a very limited support for division. I can divide a 16-bit numerator by an 8-bit denominator for a 16-bit quotient and 8-bit remainder. This is adequate for converting 16-bit numbers into ascii strings for display.
I don't think division is common enough in most micro-controller applications that I want to provide hardware support for it.

My goal is not to design the most powerful processor that I can --- my goal is to support motion-control (the PID algorithm) and keep the cost down as much as possible --- also, by making it a Forth processor it is more fun!


Attachments:
File comment: Upgraded with an EX register in order for multiplication to generate a 32-bit product.
toyf.txt [59.7 KiB]
Downloaded 7 times
Thu Jan 11, 2018 9:26 pm
Profile

Joined: Sun Jul 23, 2017 1:06 am
Posts: 50
Hugh Aguilar wrote:
My goal is not to design the most powerful processor that I can --- my goal is to support motion-control (the PID algorithm) and keep the cost down as much as possible --- also, by making it a Forth processor it is more fun!

I have yet another upgrade (attached).

I provided more support for 32-bit arithmetic. I have instructions for shifting down the product --- this divides the product by unity assuming that unity is a power of 2 --- this makes the multiplication more useful.
I also have support for adding and subtracting double-precision numbers --- still pretty slow though --- realistically, if anybody needs 32-bit numbers, they should just use the ARM Cortex that has 32-bit registers.

I moved the stacks down to the bottom of zero-page. This is useful in an FPGA that doesn't have 256 words (512 bytes) of RAM on-board --- if the FPGA has 128 words (256 bytes), the stacks will still be in internal RAM --- I want the TOYF to be reasonably efficient on very small inexpensive FPGA chips (it is likely that low cost will be the only advantage it has over other designs).

I added better support for working with byte data --- this could speed up string handling.

I changed how literal values are loaded into AX. Now fewer instructions are needed. I purposely left some instructions undefined so they can be used for application-specific purposes.

I still don't have support for division with a 32-bit numerator --- this is going to be a big target for criticism --- I don't think division is very useful in micro-controllers though, so I'm not going to worry about it.

thanks for your interest --- Hugh


Attachments:
File comment: better support for 32-bit data, better support for FPGA chips with only 128 words of internal RAM, and better support for strings
toyf.txt [60.83 KiB]
Downloaded 9 times
Sun Jan 21, 2018 4:54 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 24 posts ]  Go to page Previous  1, 2

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software