View unanswered posts | View active topics It is currently Thu Mar 28, 2024 4:42 pm



Reply to topic  [ 19 posts ]  Go to page 1, 2  Next
 Verilog code review for 8-bit computer 
Author Message

Joined: Sun Jan 13, 2019 5:03 am
Posts: 7
I introduced myself and my project idea briefly here: viewtopic.php?f=3&t=10&start=15#p4035

I am very new to FPGAs and HDL. I recently got my first board, a Lattice Icestick with iCE40-HX1K FPGA. I've been able to implement a working 8-bit CPU on it using the Project Icestorm tools. arachne PNR is giving me "PLBs 155 / 160" in output, so I'm pretty sure I'm close to filling the FPGA.

My second board, an iCE40 UltraPlus UP5K board just arrived, so when I reach the limits of the HX1K, I'll be ready to continue. But I'm wondering if my Verilog could be optimized to make better use of the PLBs.

Would someone with Verilog experience please give me an overall code review on the current state of my design? It would be great to know how I can improve my Verilog going forward. I've cleaned up the project as much as possible and made it publicly available here: https://gitlab.com/fmahnke/vale/tree/pr ... le-verilog

I have been using TBs and simulating using Icarus and GTKWave before programming the FPGA. I did not include those things in the repository for brevity and clarity.


Tue Jan 29, 2019 12:03 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
A good idea to ask for review! I think I can say there are two ways to try to connect the Verilog-as-written with the physical size of the result:
  • using a pre-trained human brain to see which kinds of constructs or approaches are likely to have area-expensive effects.
  • using all available tools, and some experimentation with the source code, to see what you have.

By "all available tools" I mean that it might help to try different synthesis engines, and see if you can get them to output fine-grained reports, so each of your Verilog modules can be measured for size. Getting the right reports, and learning how to read them, can be very productive. Among the engines you can try, for zero monetary cost:
    project icestorm (arachnepnr)
    project trellis (nextpnr)
    IspLever by Lattice Semiconductor
    Xilinx ISE
    Xilinx Vivado
    Quartus II integrated Synthesis by Altera.

That is, even if you have no plans to implement on a xilinx device, or even to buy one, you can use their toolchain to explore the physical effects of your Verilog source. You might find reports from one tool are easier to understand than reports from another.

The first step, I think, if your design covers several Verilog modules, is to try to measure the physical size of each one - otherwise you risk trying to minimise something which isn't the largest thing.


Tue Jan 29, 2019 9:40 am
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 157
Hi,

You might wish to have a look at James Bowman's J1 Forth cpu - also written in less than 200 lines of compact verilog to run on an HX1K.

http://www.excamera.com/sphinx/fpga-j1.html

Additionally - not a well known fact, but the HX4K part actually has 7680 LUTs - it's the same die as the HX8K - but Lattice disable half of it in their programming software. The full "8K" can be redeemed by using the IceStorm opensource toolchain.


Ken


Tue Jan 29, 2019 11:29 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I had a quick look at the code, which looks good. One thing I noticed was a lot of
Code:
if (ir==XXX)
else if (ir==XXY)
else if (ir==XXW)


One trick I’ve found that seems to reduce the size (but this may depend on the synthesis tool) and increases performance is to use case/casez statements instead of an if/elsif tree.
Code:
case(ir)
XXX: <code>
XXY: <code>
XXW: <code>
default: <code>
endcase

‘if/elsif’ is a priority encoder being applied after the instruction is decoded. I believe the ‘case’ statement leaves out re-encoding the decode and acts a bit like:
Code:
if (ir==XXX) <code>
if (ir==XXY) <code>
if (ir==XXW) <code>

without else’s. Checking schematics might help.

_________________
Robert Finch http://www.finitron.ca


Tue Jan 29, 2019 11:30 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
One thing to look for in the reports is a count of how many incrementers, adders, comparators have been inferred. Also of interest, how many registers. You should have some idea of what you thought it would take to implement your design, and see how surprising the report is.

I do notice that you've got a lot of comparisons with the IR, including greater than/less than comparisons. You might want to look carefully at your instruction encodings. Perhaps even code up a combinatorial module which digests the IR and outputs useful appropriate control signals.

On similar lines, it might be that coding up a datapath, with an appropriate number of registers, muxes, arithmetic units, and the necessary control signals, would be a step forward. For example, there is most likely to be an incrementer associated with the PC, and most of the time you need to determine whether to increment or not. That's a single input signal. Whereas, in the code as it stands, you have quite a few "pc <= pc +1" operations each one in a different branch of a large decision tree.

I suppose there's a continuum of possible HDL descriptions for something complex: on one extreme, they describe an exact microarchitecture with datapath and control. At the other extreme, the HDL describes what needs to be done but has no hints about how it is to be done. Both of these will work, but they won't both come out the same size or speed. And, just possibly, they will need different amounts of inspection and debugging. Somehow you want clarity, and efficiency. Which is a journey - you won't get all the way in one step.


Wed Jan 30, 2019 1:40 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Fritz:
You appear to be coming at this project from a SW perspective. IOW, the coding style appears to assume processing of the HDL in a manner similar to what an HLL compiler may be expected to process and convert into an instruction stream. As BigEd points out, you need to consolidate your processing of the IR, and provide some help to the HDL synthesizer regarding your instruction encoding. For example, you have what is clearly a register immediate instruction format. In your processing of those instructions, you treat each one separately. Instead, consider masking out the opcode field and the register select field separately. The Load Immediate instruction can then be processed in a single block. Separately, the register field can be decoded and used to enable / select the correct register so you don't have to have separate blocks for each individually addressable register.

Unlike HLL compilers, HDL synthesizers are composing the desired HW from a coarser set of structures. This is in contrast to an already existing simple set of instructions which a HLL compiler targets. Thus, a considerably different paradigm is required to convert textual descriptions of complex HW structure into the basic structures provided by the target architecture. If you can't easily draw the schematic of the HW design you desire using the target architecture's basic structures yourself from the textual description you're providing the synthesizer, you cannot expect the synthesizer itself to understand your description.

_________________
Michael A.


Wed Jan 30, 2019 2:17 pm
Profile

Joined: Sun Jan 13, 2019 5:03 am
Posts: 7
This is really great feedback and I appreciate everyone's attention. This clarifies some things I was thinking about, as well as raising some new questions and giving me more tools to answer my own questions, so it's helpful.

I need a bit of time to think about some of this and will reply with a few follow up questions. Thanks again.


Wed Jan 30, 2019 11:15 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Great - please do come back with further thoughts, questions, or progress!


Thu Jan 31, 2019 6:24 am
Profile

Joined: Sun Jan 13, 2019 5:03 am
Posts: 7
Much has changed since the code review request, so I thought an update was warranted. It's become apparent this is likely to be a long-term project for me, and I've licensed the hardware and software source for this computer under GPLv3. The public repository is available here: https://gitlab.com/vale-computer/vale8x64

The readme in the repository gives an overview of my short-term goals for the project and its status.

I'm very slowly getting to know my way around the yosys reports. They seem well-organized; it's just a large amount of information to process. I've also gotten in the habit of reviewing arachne-pnr reports before/after each commit, to note any significant changes in resource utilization.

One exercise I have planned for myself is to start with a blank Verilog file in the text editor, create various logic circuits, and look at the yosys and pnr reports for them in isolation. That should shed more light on the format of the reports and the way different circuits are synthesized.

I'm not sure how well I've simplified IR decoding logic yet. I did try to pull a lot of the logic out of the sequential always block as signals and put them in a combinational always block instead.

Since the IR is 8 bits wide, I have 256 8-bit opcodes and I don't think there's enough room to reserve part of the IR for the affected registers and still have enough opcodes for the total number of instructions. I could put that data in immediate space, but that would require an extra byte and cycle for most instructions. So I have left that if/else structure mostly as it was for now, and will re-evaluate when the ISA is stable.

On that note, arriving at the instruction set has been a challenge, since I'm not verbatim following another ISA. I'm iterating on the instruction set using ideas from other ISAs and by writing assembly code and making changes based on what feels right and what doesn't. Next time I do a custom instruction set, I could see myself writing an emulator before any HDL, writing software in the emulator for a month or two to get the instruction set mostly stable, then moving to the FPGA.

I made a prototype VGA controller on the FPGA last week, so the next big thing is to finalize that and integrate it with the rest of the computer. I'm doing the software side of things first this time, writing all the assembly code to do line-oriented programs: print characters to screen, scroll when it fills up, clear it and all that. When I'm happy with all of it in the C/SDL emulator, I'll finish the hardware.


Sun Feb 17, 2019 6:00 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Sounds like good progress! I support the idea of writing an emulator and then writing some code for the new machine: it's a good quick way to get a grip of how the machine is working, what feels missing and what seems superfluous. The efficiency of the emulator is a non-issue, unlike an HDL model.

With a small instruction word and a large instruction set, you might well not be able to dedicate any bits to registers. But it might be that you can always use the same bits when you do need to specify a register, and this should simplify decode.

Also worth noting that you never need all the instructions you can think of: a slightly less capable machine might be much more buildable. An instruction set of rather less than 256 opcodes might be much easier to map to a regular decode strategy.

Edit: oh, and thanks for putting your work online and under a permissive license. That's always good to see. I like the project objectives very much:
Quote:
...an effort to create a modern, 8-bit programmable computer powerful enough to inspire creativity and simple enough to be completely understood by the motivated user.


Sun Feb 17, 2019 3:59 pm
Profile
User avatar

Joined: Fri Oct 20, 2017 7:54 pm
Posts: 8
Updates
I add frequent development updates on Twitter (meh) and on Mastodon (preferred).

Will watching the repository (and this thread) be enough?
I neither use Twitter nor Mastodon.

_________________
"Stay OmmmMMMmmmPtimistic!" — yeti
"Logic, my dear Zoe, merely enables one to be wrong with authority." — The 2nd Doctor
"Don't we all wait for SOMETHING-ELSE-1.0?" — yeti


Sun Feb 17, 2019 5:01 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(As it happens, you can also follow a Mastodon user on RSS. In this case
https://fosstodon.org/users/daremo.rss
)


Sun Feb 17, 2019 5:18 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
fmahnke:
BigEd wrote:
Also worth noting that you never need all the instructions you can think of: a slightly less capable machine might be much more buildable. An instruction set of rather less than 256 opcodes might be much easier to map to a regular decode strategy.

BigEd's point here is very much on target. When I decided to extend my 6502/65C02 core to support 16-bit operations and high-level languages like C and Pascal, I decided to use prefix instructions rather than mode settings in the processor status word. With the 8 prefix instructions I selected, I have access to more that 2000 valid instruction combinations; this is besides the 252 single opcode instructions. The assembler I've created for myself uses table driven approach, and the table is created by a program that I wrote specifically to choose only certain instruction sequences from the complete set.

After completing a port of the Ronald Mak Pascal compiler, I decided to generate a histogram of all of the instruction sequences used by its compiler test suite. The aim of the compiler test suite is to utilize all features of the supported subset of the language. Thus, the the instruction mix of the compiler test suite is indicative of the operations that the target microprocessor must support for an efficient implementation.

I was very surprised by the results. The histogram contained only 64 instruction sequences from the greater than 2000 instruction sequences my core supports at the assembly language level. Many of the resulting addressing modes that my implementation of the various prefix instructions provide are not used. They could be used, I suppose, but at the expense of a much more complex compiler.

Rather than working out your instruction set from the assembly language programmer's perspective, I would recommend implementing those instructions that a simple compiler like the Mak Pascal compiler will require. Not every operation that a HLL like Pascal requires needs a specific instruction, but it should be fairly easy to implement in a short sequence of instructions. Instruction sequences less than 5 / 6 instructions should be the upper limit for seldom used constructs, and instruction sequences less than 3 / 4 should be the limit for more commonly used constructs.

I did add some instructions intended to provide reduced cycle counts for certain constructs that I found were occurring very regularly on entry and exit from subroutines and functions. Otherwise, I tried hard not to implement implement specific instructions to support HLL operations which most compilers have no chance in discovering.

_________________
Michael A.


Sun Feb 17, 2019 7:53 pm
Profile

Joined: Sun Jan 13, 2019 5:03 am
Posts: 7
The point on reducing the instruction set is taken. I also appreciate the rules of thumb on upper/lower limits for instruction sequences.

I have been putting some effort into reducing the instruction set. I have no plans to write code for this computer in any language but assembly, so it seems reasonable to evaluate the instruction set at least in part from the perspective of the assembly language programmer (and the the assembler, which will implement pseudoinstructions for convenience). But I guess even if I have no plans to write in higher-level languages, someone else might want to port a compiler or interpreter eventually. It would be educational and interesting to analyze the histograms for a couple HLLs, maybe Commodore BASIC and Mak Pascal as mentioned. I'll plan to do that.

yeti, I do frequent updates on Mastodon with images and sometimes video. I won't duplicate them here, but I'll definitely update here less frequently and with longer conversation. If anyone tests the Masto RSS feed, it'd be interesting to know if the images and videos seem to work correctly. I just have no RSS reader setup right now, since Firefox removed support for it.


Tue Feb 19, 2019 6:42 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Another quick thought about the richness of the instruction set: once you have a macro assembler you can make up for many deficits by using macros. In the OPC adventures we did this for those CPU models which lacked push and pop. A simple change of macro then supported the CPUs which did have push and pop.


Tue Feb 19, 2019 8:25 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 19 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software