View unanswered posts | View active topics It is currently Mon Dec 09, 2019 9:58 am



Reply to topic  [ 26 posts ]  Go to page 1, 2  Next
 CS01 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Started a new project, this time with a focus on it being educational. The CS01 processor has a non-overlapped pipeline and fits in a small FPGA. The system with uart and gpio takes about 9,000 LUTs out of 10,000 available. The CS01 is a 32 bit machine with 80-bit floating-point.
The entire instruction set fits on one page (minus four supervisor mode instructions in an appendix). It's designed to be simple yet resemble a "real" processor. There is only one memory size - 80-bits wide for instance.
Attachment:
File comment: CS01 Instruction Set
CS01IS.png
CS01IS.png [ 41.59 KiB | Viewed 2607 times ]

_________________
Robert Finch http://www.finitron.ca


Thu Jul 04, 2019 4:14 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Revised the floating-point to be 32-bit. With memory being 32-bits wide. Changed the branch instructions to branch on the comparison of two registers. An initial version of the hardware is coded, but there's no software yet. Coding an assembler is underway, there will be no compiler.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 05, 2019 3:01 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1319
Ah, I did wonder how your only-one-memory-size 80-bit idea was going to work out. Mainly because I couldn't quite imagine it, not because I couldn't believe it.


Fri Jul 05, 2019 8:13 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Quote:
Ah, I did wonder how your only-one-memory-size 80-bit idea was going to work out. Mainly because I couldn't quite imagine it, not because I couldn't believe it.

It didn't work out too badly as the system ram is only 8-bits wide. It involved a counter counting up to 10 bytes for the memory transaction. The system is slow as molasses but that isn't the primary concern.

I decided to use a RISCV clone as the processing core. More opportunity to use a variety of tools developed for RISCV. A partially developed RISCV assembler was already available. The instruction set is the base RISCV with some of the FP implemented. I had almost the same selection of instructions in CS01. I've managed to get a little bit of code running in simulation.
Code:
DLFAIL         equ      $04
TXNDX            equ      $08
FLTRES         equ      $100
RXBUF            equ      $FFC0A000
TXBUF            equ      $FFC0C000
IRQSTATUS      equ      $FFDC00B0
LEDS            equ      $FFDC0600

      
      code   18 bits
      org      $FFFC0000

      add      $t0,$x0,#$AA
      add      $t0,$x0,#0
      add      $t1,$x0,#TXBUF
.0003:
      sb      $t1,[$t1]
      add      $t1,$t1,#1
      and      $t1,$t1,#$FFC0CFFF
      bra      .0003
.0002:
      sw      $t0,LEDS
      flw      $f1,flt5
      flw      $f2,flt20
      fadd   $f3,$f1,$f2
      fsqrt   $f4,$f3


_________________
Robert Finch http://www.finitron.ca


Sat Jul 06, 2019 5:29 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1319
Oh, that's good, I think RISC-V should be a good basis (unless a project is specifically about making a CPU from scratch.)

Does your chosen RISC-V implementation come out reasonably compact in FPGA resources?


Sat Jul 06, 2019 7:52 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Quote:
Does your chosen RISC-V implementation come out reasonably compact in FPGA resources?
It does come out at about 4,500 LUTs for the entire system including single precision floats, plus a couple of minor additions to the base ISA. The RISCV in use core was written by me. The base RISCV core is simple enough for an ace to whip up a core in a short period of time. The core takes about 10 cycles to execute an instruction with memory access taking about five cycles. So it's not fast. It's running on a 50MHz clock. I suspect the float operations use about 2,500 LUTs, the cpu about 1,500 and the rest of the system 500. I'll see if I can find a utilization report.

The floating-point square root function just doesn’t want to work when embedded in the processor core. Test-benched separately it works fine. The bug has been traced to a register that doesn’t load properly in simulation. It's looking like just a simulator issue. The same floating-point square root module is being used in several projects.

_________________
Robert Finch http://www.finitron.ca


Sun Jul 07, 2019 5:27 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Utilization report for the system:
Code:
+--------------------+---------------------------+--------------+--------------+------------+-----------+--------------+----------+----------+--------------+
|      Instance      |           Module          |  Total LUTs  |  Logic LUTs  |   LUTRAMs  |    SRLs   |      FFs     |  RAMB36  |  RAMB18  | DSP48 Blocks |
+--------------------+---------------------------+--------------+--------------+------------+-----------+--------------+----------+----------+--------------+
| SocCS01            |                     (top) | 4445(42.74%) | 4240(40.77%) | 144(1.50%) | 61(0.64%) | 2351(11.30%) | 0(0.00%) | 0(0.00%) |     4(8.89%) |
|   (SocCS01)        |                     (top) |    92(0.88%) |    92(0.88%) |   0(0.00%) |  0(0.00%) |   259(1.25%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   ucg1             |                cs01clkgen |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     0(0.00%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   ucpu1            |                 cs01riscv | 4273(41.09%) | 4069(39.13%) | 144(1.50%) | 60(0.63%) |  1987(9.55%) | 0(0.00%) | 0(0.00%) |     4(8.89%) |
|     (ucpu1)        |                 cs01riscv | 2141(20.59%) | 1997(19.20%) | 144(1.50%) |  0(0.00%) |   811(3.90%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u2             |                       i2f |   177(1.70%) |   177(1.70%) |   0(0.00%) |  0(0.00%) |    42(0.20%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       (u2)         |                       i2f |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u0           |                    delay1 |     5(0.05%) |     5(0.05%) |   0(0.00%) |  0(0.00%) |     3(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u1           | delay1__parameterized0_29 |    10(0.10%) |    10(0.10%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u2           |    delay1__parameterized1 |    78(0.75%) |    78(0.75%) |   0(0.00%) |  0(0.00%) |    31(0.15%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u4           |             cntlz32Reg_30 |    84(0.81%) |    84(0.81%) |   0(0.00%) |  0(0.00%) |     6(0.03%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u3             |                       f2i |   308(2.96%) |   308(2.96%) |   0(0.00%) |  0(0.00%) |    33(0.16%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u4             |                  fpAddsub |   299(2.88%) |   299(2.88%) |   0(0.00%) |  0(0.00%) |   110(0.53%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d1           |                 delay2_16 |    49(0.47%) |    49(0.47%) |   0(0.00%) |  0(0.00%) |    16(0.08%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d10          | delay1__parameterized0_17 |     8(0.08%) |     8(0.08%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d11          | delay1__parameterized0_18 |     5(0.05%) |     5(0.05%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d14          |    delay1__parameterized3 |     5(0.05%) |     5(0.05%) |   0(0.00%) |  0(0.00%) |    24(0.12%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d15          |    delay1__parameterized2 |    86(0.83%) |    86(0.83%) |   0(0.00%) |  0(0.00%) |     7(0.03%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d16          | delay1__parameterized0_19 |    70(0.67%) |    70(0.67%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d17          | delay1__parameterized0_20 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d2           |    delay2__parameterized0 |    21(0.20%) |    21(0.20%) |   0(0.00%) |  0(0.00%) |     2(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d3           | delay1__parameterized4_21 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |    27(0.13%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d4           | delay1__parameterized0_22 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d5           | delay1__parameterized0_23 |    44(0.42%) |    44(0.42%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d5a          | delay1__parameterized3_24 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d6a          | delay1__parameterized3_25 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |    24(0.12%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d7           | delay1__parameterized0_26 |     1(0.01%) |     1(0.01%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d8           | delay1__parameterized0_27 |     1(0.01%) |     1(0.01%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       d9           | delay1__parameterized0_28 |     9(0.09%) |     9(0.09%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u5             |                     fpMul |   114(1.10%) |   113(1.09%) |   0(0.00%) |  1(0.01%) |   177(0.85%) | 0(0.00%) | 0(0.00%) |     4(8.89%) |
|       (u5)         |                     fpMul |    65(0.63%) |    65(0.63%) |   0(0.00%) |  0(0.00%) |   106(0.51%) | 0(0.00%) | 0(0.00%) |     4(8.89%) |
|       u14          |    delay2__parameterized1 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u15          |  delay2__parameterized1_8 |    28(0.27%) |    28(0.27%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u16          |    delay2__parameterized2 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |    23(0.11%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u17          |  delay2__parameterized2_9 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |    23(0.11%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u2a          | delay2__parameterized1_10 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u2b          | delay2__parameterized1_11 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u3           |                 delay2_12 |     1(0.01%) |     1(0.01%) |   0(0.00%) |  0(0.00%) |    16(0.08%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u5           | delay2__parameterized1_13 |     1(0.01%) |     1(0.01%) |   0(0.00%) |  0(0.00%) |     2(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u7           | delay2__parameterized1_14 |    15(0.14%) |    15(0.14%) |   0(0.00%) |  0(0.00%) |     2(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u8           |                 delay3_15 |     4(0.04%) |     3(0.03%) |   0(0.00%) |  1(0.01%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u6             |                     fpDiv |   591(5.68%) |   591(5.68%) |   0(0.00%) |  0(0.00%) |   176(0.85%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       (u6)         |                     fpDiv |    64(0.62%) |    64(0.62%) |   0(0.00%) |  0(0.00%) |    58(0.28%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u2           |                  fpdivr16 |   527(5.07%) |   527(5.07%) |   0(0.00%) |  0(0.00%) |   118(0.57%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u7             |                    fpSqrt |   357(3.43%) |   357(3.43%) |   0(0.00%) |  0(0.00%) |   279(1.34%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u1           |               fpDecompReg |    67(0.64%) |    67(0.64%) |   0(0.00%) |  0(0.00%) |    33(0.16%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u2           |                     isqrt |   290(2.79%) |   290(2.79%) |   0(0.00%) |  0(0.00%) |   245(1.18%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u3           |  delay1__parameterized0_7 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u8             |               fpNormalize |   231(2.22%) |   173(1.66%) |   0(0.00%) | 58(0.60%) |   259(1.25%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       (u8)         |               fpNormalize |     3(0.03%) |     3(0.03%) |   0(0.00%) |  0(0.00%) |   100(0.48%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       genblk1.clz0 |                cntlz32Reg |    34(0.33%) |    34(0.33%) |   0(0.00%) |  0(0.00%) |     6(0.03%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u11          |    delay1__parameterized4 |     2(0.02%) |     2(0.02%) |   0(0.00%) |  0(0.00%) |     2(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u21          |    delay1__parameterized5 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u23          |                    delay2 |     9(0.09%) |     9(0.09%) |   0(0.00%) |  0(0.00%) |    16(0.08%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u31          |  delay1__parameterized5_1 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u32          |  delay1__parameterized5_2 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u33          |    delay3__parameterized0 |    79(0.76%) |    32(0.31%) |   0(0.00%) | 47(0.49%) |    51(0.25%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u51          |                  delay2_3 |     9(0.09%) |     1(0.01%) |   0(0.00%) |  8(0.08%) |     8(0.04%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u52          |  delay3__parameterized1_4 |     1(0.01%) |     0(0.00%) |   0(0.00%) |  1(0.01%) |     0(0.00%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u62          |  delay1__parameterized6_5 |    12(0.12%) |    12(0.12%) |   0(0.00%) |  0(0.00%) |     8(0.04%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u63          |    delay2__parameterized3 |    72(0.69%) |    72(0.69%) |   0(0.00%) |  0(0.00%) |    54(0.26%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u64          |    delay1__parameterized0 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u65          |                    delay3 |     5(0.05%) |     4(0.04%) |   0(0.00%) |  1(0.01%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u81          |                      vtdl |     2(0.02%) |     1(0.01%) |   0(0.00%) |  1(0.01%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u82          |  delay1__parameterized6_6 |     3(0.03%) |     3(0.03%) |   0(0.00%) |  0(0.00%) |     8(0.04%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|     u9             |                   fpRound |    55(0.53%) |    54(0.52%) |   0(0.00%) |  1(0.01%) |   100(0.48%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       (u9)         |                   fpRound |    54(0.52%) |    54(0.52%) |   0(0.00%) |  0(0.00%) |    91(0.44%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u21          |    delay3__parameterized1 |     1(0.01%) |     0(0.00%) |   0(0.00%) |  1(0.01%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|       u22          |    delay1__parameterized6 |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     8(0.04%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   udbd             |               BtnDebounce |    20(0.19%) |    20(0.19%) |   0(0.00%) |  0(0.00%) |    22(0.11%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   udbu             |             BtnDebounce_0 |    21(0.20%) |    21(0.20%) |   0(0.00%) |  0(0.00%) |    22(0.11%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   ued1             |                  edge_det |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     1(0.01%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   urx1             |           rtfSimpleUartRx |    22(0.21%) |    21(0.20%) |   0(0.00%) |  1(0.01%) |    32(0.15%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   urxmem1          |                  rxtx_mem |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     0(0.00%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   utx1             |           rtfSimpleUartTx |    17(0.16%) |    17(0.16%) |   0(0.00%) |  0(0.00%) |    28(0.13%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
|   utxmem1          |                  rxtx_mem |     0(0.00%) |     0(0.00%) |   0(0.00%) |  0(0.00%) |     0(0.00%) | 0(0.00%) | 0(0.00%) |     0(0.00%) |
+--------------------+---------------------------+--------------+--------------+------------+-----------+--------------+----------+----------+--------------+

_________________
Robert Finch http://www.finitron.ca


Sun Jul 07, 2019 5:33 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Got the square root to work! Processing the load signal needed to be moved to after other processing to guarantee it would cause an override of the operation in progress. Found this while fixing the divide operation to allow restarts.

The average number of clocks per instruction can now be calculated with fp operations. It’s a fairly easy exercise to do. Read two CSR’s convert ints to floats and divide.

Added a bunch more CSR’s (control and status registers).
There’s a slightly non-standard entry to machine level code. A trap vector isn’t used (not required). Instead the machine level code acts like a separate hardware thread and continues on from where it last left off before switching to user level code. Normally a trap vector would set the entry point. But it’s easy enough to just use a branch instruction to go back to the entry point. Another non-standard feature is the ability of user level code to read and set the user level program counter CSR. This CSR is provided so that machine level code can know where an exception or interrupt occurred in user code. However, it doesn’t make a lot of sense to add extra hardware to restrict this to machine level code, since the program counter is readily available just by performing a jump-and-link to the next instruction.
Another brownfield extension to RISCV is the use of the LUI (load upper immediate) instruction to set the upper immediate bits for the next instruction if the LUI’s target register is x0. This instruction would otherwise be a nop. This saves at least one instruction and one register over building a constant in a register for large constants.

_________________
Robert Finch http://www.finitron.ca


Mon Jul 08, 2019 2:49 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
I’ve been pondering just how complex to make the CS01 core. As an educational tool it shouldn’t be so complex that it’s overwhelming. At the same time there are a lot of pieces to a modern computer, leaving out a critical piece could be a disaster for the student. I could make a master file containing all components with ‘ifdef’s’ including or excluding pieces. But that would clutter up the source code and make it unclear what’s going on. Alternately I could make several versions of the core containing different components. Perhaps placed in different folders. A third idea is to make a core generator tool and leave it up to the student to chose components.

_________________
Robert Finch http://www.finitron.ca


Tue Jul 09, 2019 2:55 am
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 179
Location: Huntsville, AL
I think a core generator tool would provide the best result all around.

_________________
Michael A.


Tue Jul 09, 2019 6:42 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1319
Hmm, I'd say it's very difficult to miss a critical piece: almost any machine you make will be Turing-complete. Anything you omit is a chance to explore macros in the assembler.

A core generator tool would be good though: although it expands the amount of testing a lot and increases the necessary level of documentation, it does allow exploration of tradeoffs. A smaller core is always smaller, but might also run faster and cooler. A smaller core leaves room for a larger cache or for a second (independent) core. A smaller core is easier to verify and test.

Making just two sizes of core would also allow teaching those tradeoffs.


Tue Jul 09, 2019 6:43 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Filled out the floating-point for CS01, it’s almost complete. The only piece missing is fused multiply and add instructions.

Started working on Coregen. Only a couple of the options work at the moment. One goal is to make it possible to include or exclude any instruction. Another goal is to allow for lots of experimentation.
Including fused multiply-add instructions increases the core size by about 1,000 LUTs. It may be desirable to then exclude FADD, FSUB, and FMUL which would scale the core size back down.
Attachment:
File comment: FriscvCoregen
Coregen1.png
Coregen1.png [ 45.31 KiB | Viewed 2469 times ]


A template file is used as a master for generating the desired core. Regions of code to include are denoted by the RISCV extension in brace brackets. The same thing is done for individual instructions. Source code in the template looks like:
Code:
{+F}
{+FMA}
      `FMA,`FMS,`FNMA,`FNMS:
         begin
            Rd <= ir[11:7];
            wrfrf <= 1'b1;
         end
{-FMA}
      `FLOAT:
         begin
            Rd <= ir[11:7];
            if (funct5==5'd20 || funct5==5'd24 || funct5==5'd28)
               wrirf <= 1'b1;
            else
               wrfrf <= 1'b1;
         end
{-F}


The serial transmit port of the soc is being challenging. In the terminal program only two different byte values are received even though a varied character string is being transmitted.

_________________
Robert Finch http://www.finitron.ca


Wed Jul 10, 2019 2:57 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
My eyes are watering again. Allergies.

If Coregen could be modified to accept parameter information from the assembler, then the assembler could analyze which instructions are used in a program and coregen could generate a cpu containing only those instructions.

Some time was spent researching the AMBA / AXI4 bus specification. The thought was that it might be a good idea to use an industry bus standard. However, reading through AMBA / AXI4 really is not the simplest or shortest documentation to read. There’s a desire to keep the bus simple to understand, assuming a bus might be a new concept for some students. I guess it depends on the target audience. Something that can be outlined with a couple of diagrams and a few paragraphs of explanation. Preferably able to fit on a page or two. AMBA is composed of multiple channels for read, write and control access and is geared towards high performance. Currently CS01 is using the WISHBONE bus. I’m reminded of studying the 6502 bus many years ago.

Still haven’t been able to get the transmit to work in the soc. Which led me to the possibility of using a 16550 compatible via Vivado coregen. Which led me to researching bus standards. It would probably be a good idea to use a 16550 compatible uart core. Things have gotten more sophisticated due to performance and power consumption requirements, and the availability of more transistors. How would one get introduced to all the complexities without taking things for granted?

Back in the day, the 6850 was about the first uart I studied. No fifo’s. Small number of registers. It still took quite a bit of learning to get a handle on start / stop bits, baud rates. Not mentioning USB here.

_________________
Robert Finch http://www.finitron.ca


Thu Jul 11, 2019 2:52 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1319
Indeed, those peripheral chips like the 6850 and 6522 were adequately complex to need a bit of study. They mopped up a lot of complexity which would otherwise have been a boardful of TTL.


Thu Jul 11, 2019 6:03 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 972
Location: Canada
Quote:
Indeed, those peripheral chips like the 6850 and 6522 were adequately complex to need a bit of study. They mopped up a lot of complexity which would otherwise have been a boardful of TTL.
I think the I studied the 6520 first, since I was trying to manage keyboard input for a game, this was on the Commodore PET 2031. Found out a 6520 was very similar to a 6821. When a newer machine came along it was 6522.

Limited luck has arrived working with the serial port. A 6551 compatible uart was setup and the start-up message can be seen to display on the terminal window with a minor hiccup. The baud rate seems to be off by a factor of 8. When 460800 is programmed, the terminal picks it up when set to 57600.

Receive doesn’t quite work yet. A message can be transmitted to the cmod then echoed back, but the echo isn’t correct.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 12, 2019 3:07 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 26 posts ]  Go to page 1, 2  Next

Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software