AnyCPU - View topic - Arlet's four-CPLD 6502 implementation

Page 1 of 1

[ 1 post ]

Previous topic | Next topic

Arlet's four-CPLD 6502 implementation

Author

Message

BigEd

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1884

Arlet's four-CPLD 6502 implementation

.
Arlet's latest project is well worth a look:

very careful fitting of HDL to a CPLD capability
decomposition of a 6502 CPU to four devices
many dead cycles eliminated and some 65C02 instructions added
using 4 readily available and cheap CPLDs rather than one or two larger and more expensive parts

Quote:

After a long pause, I decided to get back into 6502 hacking, and implement an idea I've been toying with for a few years: using multiple small CPLDs to implement a 6502.

My CPLD of choice was the Xilinx XC9572XL in 44 pin TFQP package. My original plan was to use 5 or 6 of them, but somewhat as a surprise to myself, I was able to fit it into 4. It's a very tight fit, especially for the control logic and the ALU. At first, it seemed completely hopeless watching the tools allocate big chunks of resources for the simplest expressions, but with a lot of experimenting and reading the fitter reports, I gradually gained an understanding on how to write the code so it would match the capabilities of the CPLDs and the tools.

A few years ago I tried something similar, but noticed that the CPLD was a very poor fit for bigger adders (mostly because there's no fast carry chain, and also because the AND-OR structure is not good for XOR operations), and had given up on the idea. But then a while ago, I was going through the datasheets for another project, and I noticed that nice dedicated XOR port in each macrocell. I spent a few days going over different ways to turn that XOR into the centerpiece of the ALU.

Quote:

All I do in the fetch cycle is finish the previous operation, and register the DB input in the IR. The core then moves to the decode cycle, where IR is examined. I've tried to give maximum access time for external memories, so AB/WE outputs are all registered, and DB input is registered as much as possible (only exception is ZP,X where DB+X is loaded in ABL). Quite a lot is happening in the decode state, though. It sets up all the control signals for ABL/ABH/ALU to start working right away.

A large part of dead cycle removal is due to the fact that the address arithmetic is all done locally in the ABL/ABH modules, rather than in the ALU. I think this may be a worthwhile approach for my FPGA core as well. I suspect that it will end up smaller. The extra adder is almost free (esp on LUT6), and you save a lot on muxes, which are expensive.

Tue Jan 01, 2019 12:33 pm

Page 1 of 1

[ 1 post ]

Arlet's four-CPLD 6502 implementation

Who is online