View unanswered posts | View active topics It is currently Thu Mar 28, 2024 6:21 pm



Reply to topic  [ 12 posts ] 
 Small FPGA CPU 
Author Message

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
Here's my small FPGA CPU called X18-32. I've used it in a few commercial products as an embedded controller inside an FPGA.

https://github.com/Arlet/x18-32

It was inspired by Bowman's J1 CPU, but I decided I didn't really like the stack architecture, mainly because of all the extra shuffling that's needed to get data where it needs to be. The X18 CPU has 16 registers, each of which can be used as a pointer, and you can load and store in a single instruction with either post increment, pre decrement or a small (4 bit) offset. This makes it very easy to operate on small structures (such as peripheral registers), or arrays of data.


Sun Nov 29, 2015 9:31 am
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Arlet:

Read through your readme file and had a read through the code. Looks well thought out, and as usual, concisely and elegantly written. Have you built an assembler, and if not, what tool(s) do you use to program it for your applications?

_________________
Michael A.


Sun Nov 29, 2015 1:56 pm
Profile

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
I have a very simple assembler written in C. It does the very minimum to write some programs I needed. If I get a chance to clean it up a bit, I'll add it to the github repository.


Sun Nov 29, 2015 2:01 pm
Profile

Joined: Sun Aug 04, 2013 2:19 am
Posts: 8
Arlet, good job. How fast does it run?

The source refers to a file called 'code.v'. Is that where you put the instructions to execute? What does it look like?

Thanks


Tue Dec 01, 2015 4:39 am
Profile

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
Yes, the code.v is the block RAM initialization, using a bunch of defparam statements like this:
Code:
defparam code_data.INIT_05=256'h41850000cffd410141850000cff6e2014043c027f001f03ac0301404001ca040;


It runs 100 MHz on a Spartan6 without extra effort.

Here's some source code. You can see how I just allocate some registers (r8, r9, r10) to fixed locations. R8 points to the peripherals, and with the 4 bit offset, I can reach all peripherals in this project.
Code:
reset:
        jmp     init
const:
        .long   0x80000000              // start of sdram

init:   mov     r4, #0x40
        mov     r8, #0x200              // r8 = peripherals
        mov     r9, #const              //
        mov     r10, (r9)               // get sdram start
        mov     r2, #0x1                //
        lsl     r2, #8                  // r2 = 1 << 20
fill:
        mov     (r10)+, r2              //
        sub     r2, #1                  //
        bne     fill                    //
mainloop:
        call    wait_trigger1           // wait for trigger signal
        mov     r0, #1                  //
        mov     (r8+4), r0              // start recording
        call    wait_record1            // wait for record to start
        call    wait_record0            // wait for record to stop
        mov     r11, (r8+4)             // get maximum pixel count
        mov     r0, r11                 //
        call    putlong                 // send to uart
        call    crlf                    //


Tue Dec 01, 2015 5:50 am
Profile

Joined: Sun Aug 04, 2013 2:19 am
Posts: 8
Thanks. That's a good-looking instruction set.


Wed Dec 02, 2015 6:34 am
Profile

Joined: Sun Aug 04, 2013 2:19 am
Posts: 8
Arlet, could you clarify - in your github readme you state:
Code:
...
smode/dmode can be one of the following:
  00 direct register load/store ("mov reg, reg" not supported)
...


In your example code above, you do just that - mov r0,r11

???


Mon Dec 07, 2015 1:56 am
Profile
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
I noticed the apparent lack of reg-to-reg move, too, but eventually I figured it out (I think). The instruction format called "Move" lets you encode lots of stuff, but as noted the encoding for a reg-to-reg move is not supported. Dunno why that wasn't feasible. However, the instruction format called "ALU" lets you specify an "operation" between registers, and operation 1010 is move. I guess it was just easier to support that than to support the other way??

Overall, this machine's instruction encoding seems a bit sparse, and I suppose from one point of view it might be considered somewhat inefficient. But, paradoxically, I find it charming! And it's hard to argue with an effortless 100 MHz. :)

-- Jeff

_________________
http://LaughtonElectronics.com


Mon Dec 07, 2015 3:34 am
Profile WWW

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
Indeed, the ALU has a move operation because the only path to the register file is through the ALU, which helps to keep the design fast. The move operation is used for immediate moves to registers, for instance.

The separate "move" instruction is intended for load/store, and I figured that any effort to get the reg-reg variant working would be wasted, since there's already a reg-reg move through the ALU operations.


Mon Dec 07, 2015 6:14 am
Profile

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
The hardest part of a new CPU core design is making choices, especially at the beginning when you're starting at a blank piece of paper.


Mon Dec 07, 2015 7:58 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Nice work ! How many logic cells ?

_________________
Robert Finch http://www.finitron.ca


Wed Dec 09, 2015 2:51 am
Profile WWW

Joined: Sat Aug 22, 2015 6:26 am
Posts: 40
Targeting Spartan 6:

Code:
Slice Logic Utilization:
  Number of Slice Registers:                   179 out of  11,440    1%
    Number used as Flip Flops:                 176
    Number used as Latches:                      3
    Number used as Latch-thrus:                  0
    Number used as AND/OR logics:                0
  Number of Slice LUTs:                        663 out of   5,720   11%
    Number used as logic:                      609 out of   5,720   10%
      Number using O6 output only:             534
      Number using O5 output only:               9
      Number using O5 and O6:                   66
      Number used as ROM:                        0
    Number used as Memory:                      52 out of   1,440    3%
      Number used as Dual Port RAM:             46
        Number using O6 output only:            22
        Number using O5 output only:             6
        Number using O5 and O6:                 18
      Number used as Single Port RAM:            6
        Number using O6 output only:             2
        Number using O5 output only:             0
        Number using O5 and O6:                  4


Without the 32 bit barrel shifter it drops to 387 LUTs.


Wed Dec 09, 2015 4:42 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 12 posts ] 

Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software