| 
    
        | Last visit was: Sun Oct 26, 2025 3:21 pm 
 | It is currently Sun Oct 26, 2025 3:21 pm 
 |  
 
 
 
	
			
	
	 Astorisc : A pipelined Risc-V from scratch ? 
        
        
            | Author | Message |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | Hello everyone, I haven't really been active here yet, but I've been lurking a lot, and for a long time already. So... I have this crazy idea to build a Risc-V processor mostly from logic circuits ever since Robert Baruch posted videos on youtube about his LMARV-1 (Learn Me A Risc-V) about four years ago. His project stopped, then got rebooted, then stopped again. There's also Phil Wright's project on Hackaday.io, and then the Pineapple ONE. But since building a Risc-V from scratch has been done already, why not add something to the challenge? So why not try to make it fast ? That would mean to make it pipelined. And then bypassed, or it wouldn't be so fast, isn't it? Yeah, I told you right at the beginning that my idea was crazy   I know I still have a lot  to learn in many aspects of the project, but this is exactly why I'm doing it.  I'm still at the very early stage of the project. I've just started modeling some parts of the processor in Logisim-evolution to test the logic. I'm also waiting for a few AUC logic gates to arrive, so that I can start testing them for real.  For now, I'm trying to document my thoughts and progress on my website, and I intend to publish all my work under a free/libre license. That is, as soon as I actually do have something to publish    I've been using Logisim-evolution in the past, but I'm now giving Digital a try. I really love the way robfinch posts a bit of his work every day on this forum, and I may try to do something like that, though I don't think I'll ever have enough free time to post so frequently. Anyway, let's see if I can get anywhere...
 
 |  
			| Sun Apr 17, 2022 1:31 pm |   |  
		|  |  
			| BigEd 
					Joined: Wed Jan 09, 2013 6:54 pm
 Posts: 1843
   | Look forward to hearing about progress!  I found this page  on your site with some articles already. Edit: see below for new, working, URL!
 
 |  
			| Sun Apr 17, 2022 3:17 pm |  |  
		|  |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | Thanks for putting the link, it made me realize I did not push the latest changes to my website!  So, that link is no more, the correct one is https://www.alrj.org/pages/Astorisc.html   No other changes than the URL, so far. Feel free to share your thoughts and remarks if you have any.
 
 |  
			| Sun Apr 17, 2022 3:41 pm |   |  
		|  |  
			| BigEd 
					Joined: Wed Jan 09, 2013 6:54 pm
 Posts: 1843
   | No specific remarks - I see you're already familiar with enormous breadboard designs, so I imagine you're ready for the scale of this.  I imagine it will take quite a few chips! 
 
 |  
			| Thu Apr 21, 2022 4:05 pm |  |  
		|  |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | Indeed, I'm prepared ! That project will definitely call for a backplane and multiple no-so-small boards. Which in itself will be a challenge if I want to keep the clock cycle short. Take the register file, for instance. I could not find any dual-ported static RAM that would fit my target clock cycle of 30 ns (from writing to a port to availability of the data on the other port). I have never played with synchronous RAM, but the way I understood the datasheets, although they can achieve very short clock cycle and can be very good at streaming data, they seem to have big(ish) latency for random access.  So, unless I can find a nice chip that fits the bill, I'll have to go with two banks (for two simultaneous outputs) of 32 registers, each of them consisting of two 16+ bits wide flip-flop. That's 128 TSSOP-56 chips just for the registers themselves, not counting the decoding/selecting part. Without even starting, my intuition tells me that this won't fit on a 160x100mm euro board  
 
 |  
			| Fri Apr 22, 2022 7:23 am |   |  
		|  |  
			| drogon 
					Joined: Sun Oct 14, 2018 5:05 pm
 Posts: 62
   | alrj wrote: So, unless I can find a nice chip that fits the bill, I'll have to go with two banks (for two simultaneous outputs) of 32 registers, each of them consisting of two 16+ bits wide flip-flop. That's 128 TSSOP-56 chips just for the registers themselves, not counting the decoding/selecting part. Without even starting, my intuition tells me that this won't fit on a 160x100mm euro board  Could you start with the RV32E spec.? ie. "embedded" and just 16 registers rather than the full 32? -Gordon
 
 |  
			| Fri Apr 22, 2022 9:13 am |  |  
		|  |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | drogon wrote: alrj wrote: Could you start with the RV32E spec.? ie. "embedded" and just 16 registers rather than the full 32?
Not only that, but with careful planing I could probably even use the same board layout twice. The only differences between them would be the high bit in register selection (rs1, rs2, rd), and the special case for r0 that is always zero. That would also make sense considering the smallest quantity of boards at JLCPCB or PCBWay is usually 5.
 
 |  
			| Fri Apr 22, 2022 9:27 am |   |  
		|  |  
			| drogon 
					Joined: Sun Oct 14, 2018 5:05 pm
 Posts: 62
   | alrj wrote: drogon wrote: alrj wrote: Could you start with the RV32E spec.? ie. "embedded" and just 16 registers rather than the full 32?
Not only that, but with careful planing I could probably even use the same board layout twice. The only differences between them would be the high bit in register selection (rs1, rs2, rd), and the special case for r0 that is always zero. That would also make sense considering the smallest quantity of boards at JLCPCB or PCBWay is usually 5.Hope it works out - I'm currently experimenting with RISC-V and FPGAs - and even though I'm using others models, I have a long way to go to get what I want! -Gordon
 
 |  
			| Fri Apr 22, 2022 9:44 am |  |  
		|  |  
			| robfinch 
					Joined: Sat Feb 02, 2013 9:40 am
 Posts: 2405
 Location: Canada
   | Could a CPLD be used as a register file?
 I got some 74LS670's for use as a register file a while ago. It is a 4-entry, 4-bit register file with separate read and write ports. It may require fewer chips than using discrete ff's.  But the 670 may be too sluggish for your design. If you have a largish board with 128 ff's I think it may be slow.
 _________________Robert Finch   http://www.finitron.ca 
 
 |  
			| Sat Apr 23, 2022 3:46 am |   |  
		|  |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | Maybe CPLD could be used, but I'd like to restrict myself to non-programmable logic chips only. 
 I've had a quick look at the 74LS670 datasheet, the chip is too slow (and 5V only anyway). Also, 4-entries of 4 bits means they store 16 bits each, which is only as many as a 16374.
 
 Regarding the 128 flip-flops, their outputs would be in four different groups (low halfword and high halfword, on each of the two output ports) so maybe the fan-in of "only" 32 could work.
 It's the fanout on the write port that would likely need a split/buffering stage because the value to write has to go to both banks at the same time, that's 64 chips. With all that, if I factor in the distance between input / flip-flops / outputs, I can certainly imagine how it would become too slow.
 
 So yeah, going for two boards for the register file might definitely be the best bet.
 
 
 |  
			| Sat Apr 23, 2022 3:21 pm |   |  
		|  |  
			| DockLazy 
					Joined: Sun Mar 27, 2022 12:11 am
 Posts: 59
   | There should be a logic family datasheet(it might be covered under application notes as well) for the logic type you are using that will likely have a graph of capacitive load vs propagation time. From that you could work out how much buffering you will need. alrj wrote: Indeed, I'm prepared ! That project will definitely call for a backplane and multiple no-so-small boards. Which in itself will be a challenge if I want to keep the clock cycle short. Take the register file, for instance. I could not find any dual-ported static RAM that would fit my target clock cycle of 30 ns (from writing to a port to availability of the data on the other port). I have never played with synchronous RAM, but the way I understood the datasheets, although they can achieve very short clock cycle and can be very good at streaming data, they seem to have big(ish) latency for random access.  So, unless I can find a nice chip that fits the bill, I'll have to go with two banks (for two simultaneous outputs) of 32 registers, each of them consisting of two 16+ bits wide flip-flop. That's 128 TSSOP-56 chips just for the registers themselves, not counting the decoding/selecting part. Without even starting, my intuition tells me that this won't fit on a 160x100mm euro board  Could you use bypassing logic with the dual port SRAM?
 
 |  
			| Sun Apr 24, 2022 5:49 am |  |  
		|  |  
			| BigEd 
					Joined: Wed Jan 09, 2013 6:54 pm
 Posts: 1843
   | (Welcome, DockLazy!) 
 
 |  
			| Sun Apr 24, 2022 7:54 am |  |  
		|  |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | DockLazy wrote: There should be a logic family datasheet(it might be covered under application notes as well) for the logic type you are using that will likely have a graph of capacitive load vs propagation time. From that you could work out how much buffering you will need.My idea is to use AUC where possible, at 2.7V. They are quite impressive beasts. The graph in question is there, page 12: https://www.ti.com/lit/an/scea027a/scea027a.pdf  and according to the same document, the typical input capacitance is 3.0pF. DockLazy wrote: Could you use bypassing logic with the dual port SRAM?I don't see any reason why it wouldn't work, and I think I see where this is going   With a pseudo sixth stage added to the pipeline that has all the bypasses to the previous stages, I could work around the long delay between the register write and the data availability inherent to the dual-port SRAM.  But I wonder if that would really make things easier anyway. If the register boards need "only" 64 ICs each plus the decoding logic, I'm not convinced they'd even be the biggest or the most expensive ones. After all, the bypasses also add quite a lot of complexity in a physical design, as well as long paths. That's something I'll keep in mind, though.
 
 |  
			| Sun Apr 24, 2022 4:32 pm |   |  
		|  |  
			| DockLazy 
					Joined: Sun Mar 27, 2022 12:11 am
 Posts: 59
   | BigEd wrote: (Welcome, DockLazy!)Hello everyone! alrj wrote: I don't see any reason why it wouldn't work, and I think I see where this is going   With a pseudo sixth stage added to the pipeline that has all the bypasses to the previous stages, I could work around the long delay between the register write and the data availability inherent to the dual-port SRAM.  But I wonder if that would really make things easier anyway. If the register boards need "only" 64 ICs each plus the decoding logic, I'm not convinced they'd even be the biggest or the most expensive ones. After all, the bypasses also add quite a lot of complexity in a physical design, as well as long paths. That's something I'll keep in mind, though. I think it would require a mux selecting between the writeback register and the DP RAM and some logic to detect when RD == RS. Yeah the price of DP SRAM is a bit over the top. I see on Mouser there are some chips that would be fast enough for the bargain price of $300   
 
 |  
			| Mon Apr 25, 2022 3:20 am |  |  
		|  |  
			| alrj 
					Joined: Thu Feb 25, 2021 8:27 am
 Posts: 40
 Location: Belgium
   | DockLazy wrote:  I think it would require a mux selecting between the writeback register and the DP RAM and some logic to detect when RD == RS.
Like for the other bypasses, it would require one more potential source for each of both rs1 and rs2, and the logic to detect when rd == rsN and rd != r0, plus the priority to select the closest stage's bypass if more than one match. For me, a mux with N input sources is more like N pairs of 74_16244 buffers and the logic that will pull OE low on only one of them at a time. I have absolutely no experience at all with FPGA's, but from what I (think I) understand, muxes are a pretty common abstraction there. Just to give some context, I'm a sysadmin/programmer and I've loved (x86) ASM for as long as I can remember, so, I'm a low-level enthousiast. Now that I'm working on this project, I'm often surprised to see how much hardware is needed to perform what would be a super simple if-statement in any programming language     Quote: Yeah the price of DP SRAM is a bit over the top. I see on Mouser there are some chips that would be fast enough for the bargain price of $300   What a deal, and I'd only need four of them!     I saw them as well, if we are talking about the same chips, they're out of stock anyways.
 
 |  
			| Mon Apr 25, 2022 8:05 am |   |  
 
	
		| Who is online |  
		| Users browsing this forum: chrome-131-bots, claudebot and 0 guests |  
 
	|  | You cannot post new topics in this forum You cannot reply to topics in this forum
 You cannot edit your posts in this forum
 You cannot delete your posts in this forum
 You cannot post attachments in this forum
 
 |  
 |