Last visit was: Thu Dec 26, 2024 2:16 pm
|
It is currently Thu Dec 26, 2024 2:16 pm
|
Astorisc : A pipelined Risc-V from scratch ?
Author |
Message |
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
Thank you! I knew about Booth's algorithm and quickly rejected it because it would have been impractical to build using discrete components, but I hadn't realized there was this radix-4 variant. I'll definitely try and grok it! At first glance, it looks like I'd need some setup circuitry to encode the multiplicand. A shifter is quite easy and not too expensive to build with only two pairs of 16-bit buffers (one of them being active at a time while the other one is 3-stated), so not a big deal. The adder would need to be expanded to handle subtraction as well, but again, that's not too difficult. The difficulty, at least for me, will be to understand how the encoding of the operands will be affected by their "signedness". There are four different multiplication instructions in RV32, and I intend to implement them all. - MUL returns the lower 32 bits of the result, that's the easy one
- MULH returns the upper 32 bits of the full product for signed × signed,
- MULHU returns the upper 32 bits for unsigned × unsigned
- MULHSU returns the upper 32 bits for signed × unsigned.
The last one, MULHSU, is useful to implement multi-word multiplication. In any case, it looks very promising, thanks again. Now, I haven't even started implementing support for multi-cycle instructions in the Execute stage of the pipeline, who knows what corner-cases I'll discover there
|
Wed Apr 05, 2023 7:36 am |
|
|
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 703
|
Some where in the old BYTES I read about a hardware version for the 6800. The got ya there, was a -1 argument. Some thing like a * b -> h,l as hardware ports. if a was -1 you had to swap a and b. Check that your version does not have that problem. Ben. PS.Byte Magazine Volume 02 Number 07 The old brain cells are not what they used to be. Regular multiply. Booth's version may be software.6809? I do remember it had bug. PPS. Byte Magazine Volume 03 Number 04 software Byte Magazine Volume 03 Number 05 hardware chip
|
Wed Apr 05, 2023 9:16 pm |
|
|
DockLazy
Joined: Sun Mar 27, 2022 12:11 am Posts: 41
|
alrj wrote: The difficulty, at least for me, will be to understand how the encoding of the operands will be affected by their "signedness". There are four different multiplication instructions in RV32, and I intend to implement them all. - MUL returns the lower 32 bits of the result, that's the easy one
- MULH returns the upper 32 bits of the full product for signed × signed,
- MULHU returns the upper 32 bits for unsigned × unsigned
- MULHSU returns the upper 32 bits for signed × unsigned.
I thought I had some information clearly explaining how the radix-4 version of modified Booth worked, but I can't find it now. So I'm not 100% sure, but the unsigned version might need some extra adder and or multiplicand bits. At least I think the unsigned multiply might need to be treated as signed multiply.
|
Thu Apr 06, 2023 3:59 am |
|
|
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 703
|
Perhaps a 16 bit unsigned multiply for address indexing like foo[a,b] and general purpose multiply later be designed.
|
Thu Apr 06, 2023 4:16 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
I think the usual approach is to record the signs, make things positive, do the multiplication, and then fixup the sign. But be sure to test for edge cases! (I'm a bit worried about MININT, which you can't make positive...)
|
Thu Apr 06, 2023 6:51 am |
|
|
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
oldben wrote: Byte Magazine Volume 03 Number 04 software Byte Magazine Volume 03 Number 05 hardware chip I'll definitely have a look at these Byte articles, thank you for looking up the references. oldben wrote: Perhaps a 16 bit unsigned multiply for address indexing like foo[a,b] and general purpose multiply later be designed. Well, in this project, the instruction set is already defined, it's the RISC-V one. Fortunately, it means that I can perfectly leave the multiply out for the moment, as it is an optional extension to the ISA DockLazy wrote: I thought I had some information clearly explaining how the radix-4 version of modified Booth worked, but I can't find it now. So I'm not 100% sure, but the unsigned version might need some extra adder and or multiplicand bits.
At least I think the unsigned multiply might need to be treated as signed multiply. BigEd wrote: I think the usual approach is to record the signs, make things positive, do the multiplication, and then fixup the sign. But be sure to test for edge cases! Perhaps it's only a matter of doing a sign- or a zero-extension of the operands accordingly? Anyway, I'll leave the multiply for later. At the moment I'm a bit overwhelmed by the intricate dependencies between the CSRs (some of which are also memory-mapped!), the interrupts, exceptions, and the pipeline. I'm realizing that the CSRs as specified in RISC-V are not exactly simple to implement for somebody like me with no experience! For instance, the ones that are memory-mapped need two distinct read ports, since one of them could be loaded in the Decode (or Execute) stage on a CSRR* instruction, while also being read by a Load instruction in the Memory stage... Plus all the "magic" that must happen when entering or leaving a trap handler... For a processor built from discrete logic, that won't be a small piece.
|
Thu Apr 06, 2023 7:04 am |
|
|
DockLazy
Joined: Sun Mar 27, 2022 12:11 am Posts: 41
|
BigEd wrote: I think the usual approach is to record the signs, make things positive, do the multiplication, and then fixup the sign. But be sure to test for edge cases! (I'm a bit worried about MININT, which you can't make positive...) Booth's algorithm works natively with signed integers. Odd one out here are unsigned integers. The problem here is I'm unsure if having an unsigned integer stick a bit in the sign slot will break the algorithm or not.
|
Fri Apr 07, 2023 2:27 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
I stand corrected!
|
Fri Apr 07, 2023 8:04 am |
|
|
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
I have not been following this thread closely. However, the discussion about multiplication utilizing Booth’s algorithm struck a chord. Its correct signed integer multiplication is natively supported by Booth’s. For this to occur, a sticky guard bit is added on the left. I’ve previously reasoned out the modification required for Booth’s to correctly and fully handled unsigned multiplication. That is, extend the length off the operants by 1 bit on the left, and then apply Booth’s. The resulting product will be correct for all 2n LSBs of the resultant. Assuming a 1-bit Booth’s implementation, the modification to support unsigned operants will require 1 additional cycle compared to the version supporting n-bit signed operants. I think a similar modification, padding / extending the operants on the left, can be used when Booth’s algorithm is computed using 2 or more bits per cycle. My GitHub repo has some of these algorithms implemented in Verilog. The embedded comments may be helpful.
_________________ Michael A.
|
Fri Apr 07, 2023 8:06 am |
|
|
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
Well, I'm back on this project. I know it's been more than a year without any update, but during all this time I just couldn't make my mind about the implementation of interrupts and CSRs (Control and Status Registers). I knew I wanted to have interrupts, at the absolute very least to have a timer, but I wasn't ready yet to commit to the whole RISC-V way of doing things. I was afraid it would end up way too complicated for a TTL CPU, so I started to try and implement something that would not need real CSRs. My idea was to use register x0 (which is always equals to 0) to address a known fixed part of the memory and use it as "pseudo-registers". But then I realized that it couldn't just be plain memory, because the content of some of those registers need to be accessible by the CPU at any times (like the Interrupt Enable flag, for instance). That would have then required a good amount of address decoding logic, and in the end I don't think I'd have gained anything compared to actual CSRs. The second thing that was scaring me were the actual CSR manipulation instructions. RISC-V offers three pairs of instructions to access CSRs: - csrrw that reads the current value of the register then writes a new value in it;
- csrrs that reads the current value then sets some bits,
- csrrc that reads the current value then clears some bits.
All these instructions need to be able to atomically read then write the register, and I just couldn't see how to integrate this cleanly in my pipeline. That would just not play nice with the ALU. I also thought I could avoid adding complexity in the instruction decoder if I skipped all these entirely, but my logic was flawed: I would obviously still need to decode the SYSTEM class of instruction at least to implement "mret" to return from the interrupt handler. So I did the only logical thing to do in this situation: nothing Fast-forward one year later, and I'm now ready to do things right: the control and status registers, the instructions, the interrupts, and maybe even the few exceptions that make sense! After all, there's only four of them... I think I understand now that what I thought was too complicated was actually pretty fine, and all my attempts to simplify it while keeping at least a bit of compatibility were in vain. Lesson learned. Now, back to Digital. Let's see if I can get to something.
Last edited by alrj on Thu Aug 01, 2024 11:50 am, edited 1 time in total.
|
Thu Aug 01, 2024 11:45 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
> I think I understand now that what I thought was too complicated was actually pretty fine
This sounds like progress! Perhaps only possible with the passage of time. Good luck with the re-start.
|
Thu Aug 01, 2024 11:47 am |
|
|
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
BigEd wrote: Good luck with the re-start. Thanks! I have now finished updating the Decode stage to account for the new instructions and for the exceptions that could be raised from this stage. The signals that the decoder outputs for the CSR instructions may change when I design the CSR module, but they'll do for now. In the decoder, I had to switch from a lazy decoding to fully decoded opcodes, in order to be able to raise an exception on invalid opcode. Instructions ecall, ebreak and mret all three activate their dedicated Exception output wire. I have also improved the test cases of the Instruction Decoder a lot! All variants of all supported instructions should now be covered. I really like to start with the Instruction Decoder, because it is a part that is fully standalone and purely combinational: instruction comes in, signals and values come out. The next parts will have more dependencies between each other: CSRs module, several new sources to choose from for the PC, more exceptions to raise, then actually handling interrupts and exceptions.
|
Tue Aug 06, 2024 8:25 am |
|
|
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
I moved the bypass selection logic one stage earlier, in Decode, to avoid having it in the critical path, at the beginning of the Execute stage.
I'll have to double-check if it is still correct in every situations. For this, I have left the older implementation in Decode, and the circuit compares both results and halts the simulation if they differ.
|
Sun Aug 11, 2024 4:08 pm |
|
|
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
Well, that was too good to be true.
It looked right at first, but the bypass selection as computed during Decode becomes incorrect when there is a "use after load" sequence of instructions: the insertion of a bubble in the pipeline changes everything.
End of the interlude, back to the CSRs and exceptions.
|
Fri Aug 16, 2024 9:30 am |
|
|
alrj
Joined: Thu Feb 25, 2021 8:27 am Posts: 38 Location: Belgium
|
It looks like I could not leave the idea alone, I had to get back to it. I misunderstood the comments in some verilog code found on github and I think it gave me the solution I was looking for With the forwarding logic done in the Decode stage, I can actually have that stage send a bubble preemptively, instead of waiting for the instruction to arrive in Execute, where it would then wait for the load to finish. So, not only do I think it will work and shorten the critical path, I have the feeling it will even help make things simpler when I add support for instructions that may spend multiple cycles in the Execute stage (like multiplication). I still need to implement and test it, but I've turned the potential solution in my head for the last three days and I really believe I've got it right.
|
Tue Aug 20, 2024 5:19 pm |
|
Who is online |
Users browsing this forum: Bing [Bot], claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|