Last visit was: Mon Dec 09, 2024 7:53 am
It is currently Mon Dec 09, 2024 7:53 am



 [ 61 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
 Astorisc : A pipelined Risc-V from scratch ? 
Author Message

Joined: Mon Oct 07, 2019 2:41 am
Posts: 698
Out of stock, is getting to be a big factor in design. Things like 74LS00 are out of stock in many places.
74xxx and 74LSxxx have the advantage of being easy to work with, unlike the faster chips. Good luck with finding
a fast ported memory chip.


Mon Apr 25, 2022 8:37 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2231
Location: Canada
I used an IDT7130 (a 1kx8 dual ported ram chip) for a project quite a while ago, I believe it had an access time of about 30ns. I used it to bridge a 68000 to an PC bus.

The IDT7007 32kx8 dual ported ram comes as fast as 15ns for commercial use. Would only need about eight chips then, and loads of registers to allow register shadowing.

_________________
Robert Finch http://www.finitron.ca


Tue Apr 26, 2022 4:12 am WWW

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
robfinch wrote:
The IDT7007 32kx8 dual ported ram comes as fast as 15ns for commercial use. Would only need about eight chips then, and loads of registers to allow register shadowing.


Mouser has only 7 in stock, at 75€ a piece (including VAT).
I know the registers aren't cheap either: in 128 quantity (although I'd definitely need more in the project), the price for one 74LVCH16374 gets close to 1€. But I just realized that's still five times cheaper than the dual ported RAM :o
Plus, if I still have to add a bunch of registers for shadowing, I don't really see the point in not using plain registers in the first place. I guess it also depends on how many is "a bunch".

In other news, I'm making some progress with Digital. I actually find it easier to use than Logisim(-evolution) and I love that it allows me to write thorough test cases! That's already helped me find some mistakes a few times. I've nearly finished the ALU (which is quite simple) and I'm well into the combinatorial logic of the instruction decoder.

_________________
https://www.alrj.org/pages/Astorisc.html


Tue Apr 26, 2022 8:04 am WWW

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
Hello!

I have been making good progress on the base structure of the processor in Digital last week.

No bypass logic yet, no pipeline flushing on jump/branch, no stalling. In other words, almost none of the control logic is present yet, but all the combinatorial logic of the five stages is supposed to work, along with the intermediate pipeline registers. This is already a big step forward because it helps clarify what signals and values are really needed in each stage and thus have to be generated by the decode logic or passed from one stage to the next.
It's actually very close to a point where it could be tested with a simple program like a Fibonacci sequence, as long as I insert enough "nop" instructions manually in every place where I'd otherwise use a bypass, issue a stall, or flush the pipeline.

Also missing for now is the whole memory access bus and logic, so in the meantime I simply use separate program and data memories.

I cant' wait to continue working on it, but the garden maintenance is a bit time consuming in this season :)

_________________
https://www.alrj.org/pages/Astorisc.html


Tue May 03, 2022 9:20 am WWW

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
After some tweaking and debugging, I got a crude but working Fibonacci sequence program running on the Digital design.
I had to resort to multiplexers instead of pairs of drivers in the shift unit, because Digital thought I had multiple outputs driving a wire somewhere.

I had to make sure that an "all zeroes" state in any of the pipeline registers translate to a real no-op: first, these registers will be zero after a reset, but I also intend to flush the stages by clearing them.

_________________
https://www.alrj.org/pages/Astorisc.html


Mon May 09, 2022 3:46 pm WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 698
alrj wrote:
After some tweaking and debugging, I got a crude but working Fibonacci sequence program running on the Digital design.
I had to resort to multiplexers instead of pairs of drivers in the shift unit, because Digital thought I had multiple outputs driving a wire somewhere.

I had to make sure that an "all zeroes" state in any of the pipeline registers translate to a real no-op: first, these registers will be zero after a reset, but I also intend to flush the stages by clearing them.


Does register 0 return 0 on reads?


Mon May 09, 2022 8:19 pm

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
Good point. It's not the case yet in the circuit, but otherwise yes, that's how the Risc-V works, and it helps a lot.

My remark about getting a no-op was actually more for the instruction decoder. Except for the compressed instructions, which I will not implement, all instructions in the Risc-V ISA have their lower two bits set to 1, and I was simply ignoring them. However, the LOAD instruction uses the opcode b0000011, which means I was getting a LOAD action in my pipeline after reset if I was to ignore those two bits. To fix it, I made sure in the instruction decoder that at least one of them was equal to 1.

Otherwise, it would indeed have been a no-op (LOAD from r0+0, to r0), but it would also have prevented an instruction fetch in the same cycle because I can only afford one bus access per cycle.
Additionally, one could argue that in an architecture where the IOs are memory-mapped, a read, even discarded, could have side-effects, but I'm pretty sure I won't map any IO at address 0 :)

It may be worth mentioning that I'm not checking for invalid instructions, and I'm not sure I will. That's a decision I'll make after I have designed the exception handling mechanism.

_________________
https://www.alrj.org/pages/Astorisc.html


Tue May 10, 2022 6:50 am WWW

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
Finally got the forwarding working for the EXecute stage. I'm not entirely happy with it, though, and I'm pretty sure the logic could be "optimized". Now, the big question is of course in which direction to optimize... Towards less components or shorter path?
The forwarding logic is in the critical path for the execute stage, wich I expect to be the one that will determine the maximum clock speed, so it would make a lot of sense to go for "fast" here.

Also in place is the stall logic between Execute and MEM for "use after load" situations. At a later time, I'll rework it to allow "store after load", with a bypass from the Writeback stage to the MEM stage.

At this point, the Execute stage is already quite packed, and I'm really wondering how large that will end up when I start doing it for real!

Next step: flushing the pipeline on control transfer. Naively, I thought I could make use of registers with a Clear pin, but they are all asynchronous, while I would need a synchronous clear. Or I would need to clear them just a instant after the clock has ticked, somehow. Maybe an RS latch to the CLR lines, which is then reset by the inverted clock ? I'll need to play with the idea...

_________________
https://www.alrj.org/pages/Astorisc.html


Wed Jun 01, 2022 7:48 pm WWW

Joined: Sun Mar 27, 2022 12:11 am
Posts: 41
Would '163 counters be an option? They have synchronous reset, but might not be fast enough.


Thu Jun 02, 2022 2:01 am

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
This is a very good idea, but unfortunately I can't find one that's fast enough.

Note: At first, I thought they would have been really impractical anyway, because they are only 4 bits devices and I would have needed way too many of them. But then I realized I could limit their use to only the relevant signals, the few ones that really need to be set to zero to make sure the instruction would end up as a no-op.

_________________
https://www.alrj.org/pages/Astorisc.html


Thu Jun 02, 2022 9:54 am WWW

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
All right,

So, it took me much longer than I would like to admit, but I think I finally got the flushing logic right yesterday evening! It was already quite late, though, so I still need to test it a bit more thoroughly before claiming success on that one :)

The good thing is that it made me rework the logic of the Fetch stage entirely, and it ended up being much simpler (and faster)!

Previously, it was always the PC register that would be updated and output the address to fetch from, just like it's shown on every diagram. That meant it would need either an additional clock cycle to ack the jmp/branch target, or I could dumble-pump it and use the falling edge of clock, but then that would only leave half a cycle for the actual memory access... That was not going to cut it.

In the current design, the address for the instruction to fetch can come either from the PC register or directly from the Jump/Branch target address from the EX/MEM register. This way, the actual jump is made while the instruction is in the MEM stage. Which is perfect, because this is also the stage where the exception and interrupts will be checked, so that all control transfers will always happen at the same point in the pipeline.

When I'm confident that I got it right, I'll document it on my blog.

_________________
https://www.alrj.org/pages/Astorisc.html


Wed Jun 22, 2022 8:46 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 698
Quote:
In the current design, the address for the instruction to fetch can come either from the PC register or directly from the Jump/Branch target address from the EX/MEM register. This way, the actual jump is made while the instruction is in the MEM stage. Which is perfect, because this is also the stage where the exception and interrupts will be checked, so that all control transfers will always happen at the same point in the pipeline.

When I'm confident that I got it right, I'll document it on my blog.


Are fast IRQ 's needed any more? Would it be better to design it using I/O channel
devices (1 core per device) and use a more hardware based message passing system?
IRQ's where designed when had SLOW I/O like punched cards and magnetic tape, with DMA
so most processes waited on I/O. The whole system needs to be designed together.


Thu Jun 23, 2022 4:17 pm

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
I'm not really aiming for fast IRQ handling, it's only a by-product of me trying not to make the hardware more complicated than it already is :D
The main use I see for the interrupts in Astorisc are timers, keyboard and mouse, so definitely not a lot of data to move around.

Note that I'm not very familiar with the subject of IO channels, but as I understand it, wouldn't each dedicated IO controller itself be just like another small processor to build?
Please correct me if I'm wrong, I'm always eager to learn and discover. I've just read the Wikipedia page but if you have a pointer to a better document, feel free to share.

I may end up implementing some kind of DMA controller in the future, but it would indeed be something quite basic. It would need to halt the CPU during its operation because I can't give simultaneous access to the memory to both the CPU and the DMA controller. I already need to tristate the Fetch Address and insert a bubble in the pipeline during Load and Store operations.

In a nutshell, yes, I do believe that dedicated IO channels would be cleaner, but I'm afraid of their complexity in terms of 74 logic chips :lol:

_________________
https://www.alrj.org/pages/Astorisc.html


Fri Jun 24, 2022 7:16 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 698
alrj wrote:
I'm not really aiming for fast IRQ handling, it's only a by-product of me trying not to make the hardware more complicated than it already is :D
The main use I see for the interrupts in Astorisc are timers, keyboard and mouse, so definitely not a lot of data to move around.

For some reason I forget this was TTL project, rather than a FPGA one.
DIsk I/O of some kind is needed.
If you have a bitmapped display then you have a lot of data to move around,
for just a text display, a buffered serial device may be better than a video display.
The Xerox Alto was early bitmapped B&W TTL computer that did everything in microcode, keyboard
mouse,disc,network,bitter. https://en.wikipedia.org/wiki/Xerox_Alto

Quote:
Note that I'm not very familiar with the subject of IO channels, but as I understand it, wouldn't each dedicated IO controller itself be just like another small processor to build?
Please correct me if I'm wrong, I'm always eager to learn and discover. I've just read the Wikipedia page but if you have a pointer to a better document, feel free to share.

I may end up implementing some kind of DMA controller in the future, but it would indeed be something quite basic. It would need to halt the CPU during its operation because I can't give simultaneous access to the memory to both the CPU and the DMA controller. I already need to tristate the Fetch Address and insert a bubble in the pipeline during Load and Store operations.

DMA and IRQ's make the software more complex. CP/M (8080) and FLEX (6800) did not use IRQ's.
Not sure if the APPLE II used IRQ's.
Quote:

In a nutshell, yes, I do believe that dedicated IO channels would be cleaner, but I'm afraid of their complexity in terms of 74 logic chips :lol:


I use the 74LS6502 :) http://6502.org/users/dieter/m02/m02.htm
Ben.


Fri Jun 24, 2022 9:43 am

Joined: Thu Feb 25, 2021 8:27 am
Posts: 38
Location: Belgium
oldben wrote:
DMA and IRQ's make the software more complex

Yes, but they also make the hardware much simpler, and my background is definitely more software than hardware ;)

I know it will always be a tradeoff. Hardware is more expensive but faster, software is slower but cheaper and potentially more flexible.
I'm glad I've got the experience of my breadboard 8088 computer, because I know I can use a CompactFlash and a video framebuffer (character based, though) without needing IRQs or DMA, and even with a slow clock and an 8-bit bus, the system is really usable.

In my mind, having IRQs for things like keyboard, mouse and serial connection means that I don't have to build dedicated buffers and logic in hardware, I can just react to the event and push the data into main memory. A 16 bytes buffer may not be a big deal inside an FPGA, but it takes space when physically built on a PCB.

Anyways, since the IO are memory-mapped, I can always rework these parts at a later time. The CPU itself will be multiple boards all connected to a backplane, but I think I'll also have slots for peripherals directly connected to the bus, more or less like ISA/Vesa. Then, nothing prevents me to use whatever IO interface I want to drive each of them.

oldben wrote:
I use the 74LS6502 :)

But, but, ... I'm already building a CPU! :D

_________________
https://www.alrj.org/pages/Astorisc.html


Fri Jun 24, 2022 3:06 pm WWW
 [ 61 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

Who is online

Users browsing this forum: CCBot and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software