AnyCPU - View topic - Seeking the smallest 68000 implementation...

Page 1 of 1

[ 9 posts ]

Seeking the smallest 68000 implementation...

Author	Message
BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1833	Seeking the smallest 68000 implementation... I've been experimenting with the BlackIce FPGA board (or, to be more accurate, Dave [Hoglet] has been forging ahead and I've been trying to keep up with progress.) Dave has implemented a series of 8-bit retrocomputers on the board, based on 6502 and on Z80. In the case of his Acorn Atom model, he also managed to fit in a SID, which is a great deal bigger than the rest of the machine because it needs multipliers for the signal mixing, and the Lattice FPGA doesn't have multipliers as such, so they use lots of resources. Yesterday at the CCH in Cambridge I was chatting, with Revaldinho, with Ken Boak, who was demoing BlackIce, and Ken wondered if a 68000 model could fit into the Lattice FPGA. A good question. I suspect it would be a challenge. Ken wondered if the 68k could be emulated by a simpler engine. Perhaps a picocoded machine to run the nanocode which underlies the 68000's microcode... Anyone have any ideas, or experience with small 68000 implementations?
Mon Sep 18, 2017 11:13 am

MichaelM Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL	Re: Seeking the smallest 68000 implementation... To the best of my understanding, the ROM-based controls in the 68000 instruction sequencer can best be understood as a width constrained ROM micro-program, i.e. micro-code, and another ROM used to expand the encoded control fields in the micro-program ROM, i.e. nano-code. I can't see a need to expand the definition to include a third level of micro-program ROM, i.e. pico-code. By allowing the micro-code to use a vertical format, which uses a second level ROM to expand the control encoded control fields, Motorola was able to make substantial savings in the total ROM needed to implement the 68000. If you examine the two halves of the micro-program of my M65C02/M65C02A, you will see that both ROMs contain a lot of redundant data. This is due to the nature of the ROM-based control stores used in their implementation. The use of a PLA in the implementation of the actual 6502/65C02 control sequencer allows for a substantially more efficient implementation than the ROM-based micro-programmable approach that I used for these two cores. I am of the opinion that the additional complexity of the 68000 drove Motorola into using a ROM-based micro-programmed implementation. I suspect that additional complexity represented by the 68000 ISA would have resulted in a PLA that was too large (i.e. wide) to provide the desired operating speed. In order to reduce the total chip area dedicated to the control sequencer, a pipelined vertical micro-program was a good compromise. In the case of my soft-core implementations, I found myself needing to use a PLA, but that is a logic structure not available in FPGAs except as a discrete logic implementation. As many others have demonstrated, a micro-programmable approach to implementing the 6502/65C02 architecture is not required. The resulting control logic and state machine is very manageable using modern tool sets, and I suspect that the approach could be used with a modern re-implementation of the 68000. A micro-programmable approach may be easier to debug and update, but will suffer from the speed limitations of the row-column decoder/multiplexer needed to implement the block RAM structures in modern FPGAs. In the final analysis, there is a fundamental speed limit for my soft-cores: the speed of the block RAMs used for the micro-program ROMs are an order of magnitude (or more) slower than LUTs in the FPGAs. _________________ Michael A.
Mon Sep 18, 2017 1:33 pm

hoglet Joined: Tue Feb 10, 2015 7:07 am Posts: 52	Re: Seeking the smallest 68000 implementation... Ed, Here's one concrete data point: the TG68K core in the Matchbox. Code: Slice Logic Utilization: Number of Slice Registers: 833 out of 11440 7% Number of Slice LUTs: 3235 out of 5720 56% Number used as Logic: 3163 out of 5720 55% Number used as Memory: 72 out of 1440 5% Number used as RAM: 72 These numbers are for the whole design, including the Tube, so you can probably knock 10% off. Also, note the LUTs are 6-input. For reference, the iCE40HX8K has 7,680 4-input LUTs. Unfortunately the TG68 core is VHDL, so you can't easily run it through IceStorm. Translation would be possible using vhd2vl with some manual intervention. But I would start by trying the ao68000 core, which is Verilog. https://github.com/alfikpl/ao68000 According to the spec Quote: Uses about 4810 LE on Altera Cyclone II and about 45600 bits of RAM for microcode. A Cyclone II LE contains a 4-input LUT and a register, so this should fit in the iCE40HX8K.... Looking at the Verilog, you'll need to replace the ALTSYNCRAM ram blocks with simple behavioural equivalents. Dave
Mon Sep 18, 2017 5:53 pm

hoglet Joined: Tue Feb 10, 2015 7:07 am Posts: 52	Re: Seeking the smallest 68000 implementation... Hmmm, the ao68000 also contains a 17x17 signed multiplier. Dave
Mon Sep 18, 2017 6:22 pm

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1833	Re: Seeking the smallest 68000 implementation... Cheers Dave! Interesting, so the 68k might not be such a monster, modulo the big multiplier. (Oh but also Lattice have no distributed RAM so the register file might be an issue.
Mon Sep 18, 2017 7:24 pm

hoglet Joined: Tue Feb 10, 2015 7:07 am Posts: 52	Re: Seeking the smallest 68000 implementation... BigEd wrote: Cheers Dave! Interesting, so the 68k might not be such a monster, modulo the big multiplier. According to this: Quote: The multiplication algorithm implemented requires 38+2n clocks, where n is defined as: MULU: n = the number of ones in the <ea> MULS: n = concatanate the <ea> with a zero as the LSB; n is the resultant number of 10 or 01 patterns in the 17-bit source; i.e., worst case happens when the source is $5555 So I guess the original uses something like shift-and-add, with a 16-bit adder? BigEd wrote: (Oh but also Lattice have no distributed RAM so the register file might be an issue. There should be enough block RAM. Here's are some real results for you: Code: seed: 1 device: 8k read_chipdb +/share/arachne-pnr/chipdb-8k.bin... supported packages: cb132, cb132:4k, cm121, cm121:4k, cm225, cm225:4k, cm81, cm81:4k, ct256, tq144:4k read_blif test.blif... prune... read_pcf blackice.pcf... instantiate_io... pack... After packing: IOs 122 / 167 GBs 0 / 8 GB_IOs 0 / 8 LCs 7191 / 7680 DFF 832 CARRY 826 CARRY, DFF 26 DFF PASS 220 CARRY PASS 59 BRAMs 15 / 32 WARMBOOTs 0 / 1 PLLs 0 / 2 place_constraints... promote_globals... promoted CLK_I$2, 877 / 877 promoted $abc$48572$n4696, 665 / 665 promoted $abc$48572$n3479, 80 / 80 promoted $abc$48572$n3805, 49 / 49 promoted $abc$48572$n2814, 37 / 37 promoted $abc$48572$n3770, 37 / 37 promoted $abc$48572$n256, 87 / 87 promoted 7 nets 2 sr/we 4 cen/wclke 1 clk 7 globals 2 sr/we 4 cen/wclke 1 clk realize_constants... realized 0, 1 place... initial wire length = 131136 at iteration #50: temp = 12.2702, wire length = 138329 at iteration #100: temp = 6.29881, wire length = 93060 at iteration #150: temp = 3.23344, wire length = 60879 at iteration #200: temp = 1.35197, wire length = 42932 at iteration #250: temp = 0.00812532, wire length = 32436 at iteration #300: temp = 1.44961e-07, wire length = 32075 final wire length = 32068 After placement: PIOs 78 / 167 PLBs 951 / 960 BRAMs 15 / 32 place time 149.96s route... pass 1, 947 shared. pass 2, 723 shared. pass 3, 599 shared. pass 4, 555 shared. pass 5, 589 shared. pass 6, 617 shared. pass 7, 640 shared. pass 8, 660 shared. pass 9, 685 shared. pass 10, 727 shared. pass 11, 741 shared. pass 12, 717 shared. pass 13, 750 shared. pass 14, 752 shared. pass 15, 835 shared. pass 16, 765 shared. pass 17, 814 shared. pass 18, 833 shared. pass 19, 849 shared. pass 20, 883 shared. pass 21, 806 shared. pass 22, 822 shared. pass 23, 705 shared. pass 24, 697 shared. pass 25, 853 shared. pass 26, 692 shared. pass 27, 789 shared. pass 28, 765 shared. pass 29, 800 shared. pass 30, 733 shared. pass 31, 768 shared. pass 32, 777 shared. pass 33, 714 shared. pass 34, 679 shared. pass 35, 639 shared. pass 36, 543 shared. pass 37, 479 shared. pass 38, 488 shared. pass 39, 506 shared. pass 40, 493 shared. pass 41, 507 shared. pass 42, 542 shared. pass 43, 493 shared. pass 44, 489 shared. pass 45, 492 shared. pass 46, 497 shared. pass 47, 443 shared. pass 48, 462 shared. pass 49, 423 shared. pass 50, 422 shared. pass 51, 359 shared. pass 52, 328 shared. pass 53, 267 shared. pass 54, 293 shared. pass 55, 282 shared. pass 56, 220 shared. pass 57, 263 shared. pass 58, 247 shared. pass 59, 250 shared. pass 60, 206 shared. pass 61, 201 shared. pass 62, 181 shared. pass 63, 134 shared. pass 64, 162 shared. pass 65, 164 shared. pass 66, 127 shared. pass 67, 112 shared. pass 68, 75 shared. pass 69, 57 shared. pass 70, 44 shared. pass 71, 58 shared. pass 72, 46 shared. pass 73, 54 shared. pass 74, 46 shared. pass 75, 28 shared. pass 76, 19 shared. pass 77, 17 shared. pass 78, 31 shared. pass 79, 32 shared. pass 80, 21 shared. pass 81, 21 shared. pass 82, 21 shared. pass 83, 20 shared. pass 84, 27 shared. pass 85, 18 shared. pass 86, 10 shared. pass 87, 7 shared. pass 88, 6 shared. pass 89, 4 shared. shared net #4625 (demand = 2). used by wire $abc$58379$n6279 used by wire registers_m.pc[23] shared net #9632 (demand = 2). used by wire $abc$58379$n6279 used by wire DAT_I[10]$2 shared net #102099 (demand = 2). used by wire $abc$48572$n250 used by wire $abc$58379$n3863 shared net #105519 (demand = 2). used by wire $abc$48572$n182 used by wire $abc$58379$n3863 pass 90, 2 shared. shared net #125179 (demand = 2). used by wire $abc$58379$n3044_1 used by wire registers_m.ir[7] shared net #125274 (demand = 2). used by wire $abc$58379$n5904 used by wire registers_m.ir[7] pass 91, 0 shared. After routing: span_4 20765 / 29696 span_12 3716 / 5632 route time 1149.62s write_txt test.txt... // Reading input .asc file.. // Reading 8k chipdb file.. // Creating timing netlist.. Total number of logic levels: 15 Total path delay: 39.38 ns (25.40 MHz) So very tight indeed! Dave
Mon Sep 18, 2017 8:10 pm

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1833	Re: Seeking the smallest 68000 implementation... Wow that struggled! Surprising that the speed isn't too bad. Thanks for running it through.
Mon Sep 18, 2017 8:40 pm

hoglet Joined: Tue Feb 10, 2015 7:07 am Posts: 52	Re: Seeking the smallest 68000 implementation... What about the J68? viewtopic.php?f=13&t=347
Mon Sep 18, 2017 8:42 pm

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1833	Re: Seeking the smallest 68000 implementation... I'd like to say I'd never heard of it, but I don't think I can away with it - thanks for digging!
Tue Sep 19, 2017 5:45 pm

Page 1 of 1

[ 9 posts ]

Seeking the smallest 68000 implementation...

Who is online