Last visit was: Fri Jun 02, 2023 8:30 am
It is currently Fri Jun 02, 2023 8:30 am

 [ 1 post ] 
 Arlet's four-CPLD 6502 implementation 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1754
Arlet's latest project is well worth a look:
  • very careful fitting of HDL to a CPLD capability
  • decomposition of a 6502 CPU to four devices
  • many dead cycles eliminated and some 65C02 instructions added
  • using 4 readily available and cheap CPLDs rather than one or two larger and more expensive parts

After a long pause, I decided to get back into 6502 hacking, and implement an idea I've been toying with for a few years: using multiple small CPLDs to implement a 6502.

My CPLD of choice was the Xilinx XC9572XL in 44 pin TFQP package. My original plan was to use 5 or 6 of them, but somewhat as a surprise to myself, I was able to fit it into 4. It's a very tight fit, especially for the control logic and the ALU. At first, it seemed completely hopeless watching the tools allocate big chunks of resources for the simplest expressions, but with a lot of experimenting and reading the fitter reports, I gradually gained an understanding on how to write the code so it would match the capabilities of the CPLDs and the tools.

A few years ago I tried something similar, but noticed that the CPLD was a very poor fit for bigger adders (mostly because there's no fast carry chain, and also because the AND-OR structure is not good for XOR operations), and had given up on the idea. But then a while ago, I was going through the datasheets for another project, and I noticed that nice dedicated XOR port in each macrocell. I spent a few days going over different ways to turn that XOR into the centerpiece of the ALU.

All I do in the fetch cycle is finish the previous operation, and register the DB input in the IR. The core then moves to the decode cycle, where IR is examined. I've tried to give maximum access time for external memories, so AB/WE outputs are all registered, and DB input is registered as much as possible (only exception is ZP,X where DB+X is loaded in ABL). Quite a lot is happening in the decode state, though. It sets up all the control signals for ABL/ABH/ALU to start working right away.

A large part of dead cycle removal is due to the fact that the address arithmetic is all done locally in the ABL/ABH modules, rather than in the ALU. I think this may be a worthwhile approach for my FPGA core as well. I suspect that it will end up smaller. The extra adder is almost free (esp on LUT6), and you save a lot on muxes, which are expensive.

Tue Jan 01, 2019 12:33 pm
 [ 1 post ] 

Who is online

Users browsing this forum: CCBot and 0 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software