Last visit was: Sun Apr 06, 2025 10:25 am
|
It is currently Sun Apr 06, 2025 10:25 am
|
subOctavo: a 3-operand RISC
Author |
Message |
Myron Plichota
Joined: Mon Jan 22, 2018 2:49 pm Posts: 23
|
Code: * subOctavo is a single-thread mutation of Eric LaForest's Octavo 3-operand RISC concept at https://fpgacpu.ca. * All code, data, and IO "exist" within a single 1024x36-bit addressable space. * asm.cmd assembles <yourchoice.py> and generates x.hex, the 2-port RAMs image. * ivsim.cmd simulates subOctavo.prj and generates subOctavo.log.
* Prerequisite development tools: 1) Python3, e.g. https://www.python.org/, to implement the 2-pass assembler that generates x.hex. 2) Icarus Verilog, e.g. http://bleyer.org/icarus/, to define subOctavo and simulate x.hex.
* The characteristic width of the i instruction field is 4 bits. * The characteristic width of the program counter, and d, a, and b instruction fields is 10 bits. * The bit aggregate of a 36-bit instruction is {i, d, a, b, 2'b00}.
* The i, a and b 2-port RAMs are initialized with x.hex. * The i, a and b 2-port RAMs are read from on negative clock edges. * The i, a and b 2-port RAMs are written to at the d address on positive clock edges. * The critical path is between the negative and positive clock edges, suggesting a clock duty cycle of < 50%.
* The address 0 is reserved for a safe destination, namely 'trash'. trash might be used as a scratchpad. * The address 1 is the program counter reset value, namely 'reset', the first instruction of all programs. * The address 0x3ff points to the read-write parallel output port. * The address 0x3fe points to the read-only parallel input port. * The address 0x3fd points to the read-only free-running timer.
* Non-jump instructions treat the a-field as a pointer like the d and b fields. * Jump instructions treat the a-field as a 10-bit constant (#a).
* Any non-jump instruction with the destination address of trash suffices as a no operation. * The "standard" subOctavo nop() is {band, trash, trash, trash, 2'b00}, which encodes to 36'h300000000.
* The instruction register and program counter stack are updated on positive clock edges. * pc[0] refers to the top of the program counter stack, which is the active program counter. * With a program counter stack register array of [0:7], calls may be nested up to 7 levels. * Jumps involve 1 following instruction (which will execute) due to the pipeline latency. * Jumps write 0 to trash.
* 36-bit instruction summary: add {4'h0, d, a, b, 2'b00} d = a + b; pc[0]++ sub {4'h1, d, a, b, 2'b00} d = a - b; pc[0]++ mul {4'h2, d, a, b, 2'b00} d = a * b; pc[0]++ band {4'h3, d, a, b, 2'b00} d = a & b; pc[0]++ bor {4'h4, d, a, b, 2'b00} d = a | b; pc[0]++ bxor {4'h5, d, a, b, 2'b00} d = a ^ b; pc[0]++ srl {4'h6, d, a, x, 2'b00} d = {1'b0, a[35:1]}; pc[0]++ sra {4'h7, d, a, x, 2'b00} d = {a[35], a[35:1]}; pc[0]++ ror {4'h8, d, a, x, 2'b00} d = {a[0], a[35:1]}; pc[0]++ jze {4'h9, trash, a, b, 2'b00} d = 0; if (b == 0) pc[0] = #a else pc[0]++ jnz {4'ha, trash, a, b, 2'b00} d = 0; if (b != 0) pc[0] = #a else pc[0]++ jpo {4'hb, trash, a, b, 2'b00} d = 0; if (b >= 0) pc[0] = #a else pc[0]++ jne {4'hc, trash, a, b, 2'b00} d = 0; if (b < 0) pc[0] = #a else pc[0]++ jmp {4'hd, trash, a, x, 2'b00} d = 0; pc[0] = #a jsr {4'he, trash, a, x, 2'b00} d = 0; pc[7:2] = pc[6:1]; pc[1] = pc[0]+1; pc[0] = #a ret {4'hf, trash, x, x, 2'b00} d = 0; pc[0:6] = pc[1:7] * By convention, a don't-care field (x) is encoded as 0. See asm.py.
* Zipped subOctavo source is at https://drive.google.com/file/d/1WQC-dlEbdtbmipySzIj6YDEAGJhdybpq/view?usp=drive_link. * The x-series of .py files reflects the author's subOctavo programming learning curve. * x1.py crawls out of the primordial goo. * x15.py solves integer square roots (with remainders). * Author contact: MyronPlichota@gmail.com.
|
Sat Feb 22, 2025 5:53 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1817
|
quite a remarkable machine!
I suppose the original 8-threaded machine achieved its high performance by getting 8 clock cycles of latency?
|
Sun Feb 23, 2025 7:40 am |
|
 |
Myron Plichota
Joined: Mon Jan 22, 2018 2:49 pm Posts: 23
|
I concur.
Octavo jumps do not involve any delay slots, unlike most other 3-operand RISCs. However, generating 8-threaded code to exploit this strength is a black art.
|
Sun Feb 23, 2025 2:09 pm |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1817
|
Yes, it seems to me the designer did feel that the clever hardware design had forced a need for extremely clever software.
I don't quite understand the A and B parts of memory - is it really true that each of the two operands must come from different parts of memory?
|
Sun Feb 23, 2025 4:57 pm |
|
 |
Myron Plichota
Joined: Mon Jan 22, 2018 2:49 pm Posts: 23
|
The i, a, and b 2-port RAMs are separate instances of the 1024x36-bit ram_2p_nr_pw module. The i, a, and b RAMs write addresses are driven by the instruction d field. The i RAM read address is driven by the program counter. The a RAM read address is driven by the instruction a field. The b RAM read address is driven by the instruction b field.
|
Mon Feb 24, 2025 4:05 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1817
|
Aaah - so this is really a single logical memory implemented as three memories with the same contents?
|
Mon Feb 24, 2025 8:20 am |
|
 |
Myron Plichota
Joined: Mon Jan 22, 2018 2:49 pm Posts: 23
|
Just so!
|
Mon Feb 24, 2025 10:22 am |
|
Who is online |
Users browsing this forum: claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|