Last visit was: Sat Sep 07, 2024 10:18 am
It is currently Sat Sep 07, 2024 10:18 am



 [ 91 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next
 8 bit CPU challenge 
Author Message

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
As a result of a thread on 6502.org, I would like to propose a challenge.

The challenge is create a 6502-era CPU, using an FPGA, using roughly similar amount of resources as were available to the 6502 designers. The CPU needs to have similar capabilities as the 6502: 16 bit address bus, 8 bit data bus, 2 interrupts, reset, RDY. To make design easier, the data bus may be split into separate in/out buses. Instead of an NMI, you can make higher priority maskable IRQ. It should interface to either a block RAM, or an external async SRAM. It doesn't need to be 6502 compatible, but you should be able to port typical 6502 programs to it.

Maximum area is 128 slices on a Spartan 6 (XC6SLX4), which is about what my NMOS 6502 core requires. Use of block RAMs or DSP blocks is not permitted inside the CPU, but these resources may be used outside the CPU to build a complete working system.

The goal is to make something as powerful as possible that could theoretically have existed as a 40 pin DIP in the 70's, hopefully better than the 6502 itself. One of the goals is to keep room for future improvement, so filling up the opcode space is not encouraged.

Edit: changed limit from 120 to 128 slices.


Last edited by Arlet on Fri May 05, 2017 3:53 am, edited 2 times in total.



Thu May 04, 2017 4:23 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1796
An interesting challenge! I suspect if it's a joint submission with the One Page Computing challenge that would prove to be quite a limitation, so probably not a good idea - but 128 slices (please - not 120!) is a good limit.


Thu May 04, 2017 7:13 pm

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
I've changed the limit to 128 slices. I had picked 120 originally, because it's 1/5th of the FPGA, but a power of two is even better.


Fri May 05, 2017 3:42 am

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1796
Here's my first thought: I think I'm leaning towards a 16-bit version of ARM but with no barrel shifter and with only a byte wide memory interface. Something quite symmetrical and with simple decode. I'd have dead cycles and wasted bits and no pipelining, because being efficient would use more resources. I'd like to support byte wide data too, but that would be the last thing to add. Maybe I'd have a pair of operand bytes as well as the pair of opcode bytes. Simpler than trying to pack the operand in with the opcode, and it gives full-width constants and addresses every time. I think the result would be nice and easy to use, but not terribly efficient.

Then I had a second thought: the PDP-11. It's a capable and famous architecture, and about the right time frame. I see it takes about 1000 lines of code to describe in an emulator. Compare that to your 6502 core in 1300 lines of verilog, and the PDP-11 might be about the right size for this challenge. It's a 16-bit machine, but of course accessing memory byte-serial is no big deal - there's precedent for that.


Fri May 05, 2017 9:46 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2153
Location: Canada
I’m thinking something more along the lines of the 6800, but with two 16 bit index registers + 16 bit stack pointer and a single 8 bit accumulator. No indirect addressing. Bound to be similar to some earlier accumulator based machine. I'd be assuming a two cycle read access to synchronous memory.
Porting 6502 programs to it and getting them to run fast might be a challenge.

My other thought is a stack machine of some sort.

_________________
Robert Finch http://www.finitron.ca


Fri May 05, 2017 3:33 pm WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1796
It would be great if a transputer core could fit in 128 slices!


Fri May 05, 2017 3:48 pm

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
My goal is a mixed 8/16 bit design, with a decent set of registers, and some nice addressing modes, somewhat inspired by the 8088, but trimmed down. No idea if it'll fit, though. I'm guessing an 8 bit ALU will be the most compact, at the cost of some extra cycles for 16 bit operations. I think trading cycles for size is a good solution, because you get something that works, with a nice path to future upgrades.

I'm kinda cheating with the register file, because it makes very good use of the FPGA resources. You can fit 32 (byte-wide) registers in 2 slices per read port.

Talking about upgrade paths: the number of 128 slices gives a natural upgrade to 256 slices for an enhanced version.


Fri May 05, 2017 6:10 pm

Joined: Fri May 05, 2017 7:39 pm
Posts: 22
robfinch wrote:
My other thought is a stack machine of some sort.
Thinking about how todays compiler work, an architecture that would support these VM models would probably achieve a very high MIPS/silicon ratio ;)
It would as well be easily expandable (wider bus, seperate stack busses (one perhaps internal), floating point stack, ..). But (assembler) programming it would be entire different to how a 6502 needs to be treated. So the demand " but you should be able to port typical 6502 programs to it. " might be missed.


Fri May 05, 2017 7:59 pm

Joined: Fri May 05, 2017 7:39 pm
Posts: 22
BigEd wrote:
It would be great if a transputer core could fit in 128 slices!

8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-) 8-)

Although I have no idea whether I ever could get my old occam programming toolchain up & running again - perhaps in a second life with more time for this ;)


Fri May 05, 2017 8:06 pm

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
GaBuZoMeu wrote:
So the demand " but you should be able to port typical 6502 programs to it. " might be missed.

The intention behind that requirement is that the CPU is complete enough that such a task is possible. The task doesn't need to be straightforward.

If you can port the 6502 program to a higher level language, and compile that into efficient bytecode, that would be an acceptable solution.


Fri May 05, 2017 8:52 pm

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
I've started on a design... currently 29 slices, and it can run straight line code consisting of a couple of arithmetic instructions (add/sub/and/or/xor) between registers. I have the same register set as the 8086, and can do similar operations between 8 bit (high or low) portions, or the whole 16 bit register. There's also some preliminary code in there to do memory access.

No immediate, no memory, no stack, interrupts, flags or branches, so still a lot to go.

I'm hoping to get a mix of 6502 instructions (with extra add/sub in addition to adc/sbc, but no bit) and 8086 register set/addressing modes before I run out of slices.


Sun May 07, 2017 9:23 am

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
Upgraded the design to include mem + reg -> reg operations. The memory address is given by one of the 8 registers. Slice count jumped up to 47, mostly due to having extra mux + hold register in the address generator. Scary how quickly it goes up!

I can now do stuff like:
Code:
add al, bl
sub sp, dx
xor ax, [sp]
and ah, [cx]


Sun May 07, 2017 12:15 pm

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
53 slices, but can now reverse mem/reg operands:
Code:
add [cx], ax
add [sp], ah


Sun May 07, 2017 1:10 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2153
Location: Canada
Did some coding / design for a 6800 like CPU, but it's up to 100 slices already and it's not even half done. Got jumps, call, and branches coded but no ALU operations. My original 6502 core was about 200 slices (800LUTs) IIRC.
Tempted to start the 8,400 slice challenge. 1/4 of a xc7a200. (DSD9 fits into this I think, with 80 bit FPU).
Could we have three challenges ? big, medium, and small ?

_________________
Robert Finch http://www.finitron.ca


Wed May 10, 2017 11:08 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1796
One thing revaldinho noticed in trying to fit a series of CPUs into CPLD: the tools make an effort to make things fit, so in a roomier CPLD the design uses more resources than it does in a more constrained CPLD. So a 200 slice CPU might actually have a smaller implementation, if you had a way to find it. For that reason, for this challenge, we might target a large CPLD instead of a small corner of a medium sized FPGA. I think 128 Spartan 6 slices might be somewhere between XC95216 and XC95288 in terms of complexity, although of course the technologies are not directly comparable. CPLD won't give the super-cheap register files which Arlet likes to use.

Oddly enough, I don't see a very very small FPGA offering from Xilinx. Or, is there a way to push an FPGA design into a small corner?


Wed May 10, 2017 11:18 am
 [ 91 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next

Who is online

Users browsing this forum: CCBot and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software