View unanswered posts | View active topics It is currently Thu Apr 18, 2024 9:36 pm



Reply to topic  [ 237 posts ]  Go to page Previous  1 ... 12, 13, 14, 15, 16
 rj16 - a homebrew 16-bit cpu 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1782
DiTBho wrote:
I would spend time learning Verilog instead.

We have a topic
Resources for Newbie Verilog and VHDL People
and it would be great to see some excellent recommendations in there!


Sun Jul 04, 2021 7:57 am
Profile

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
BigEd wrote:
We have a topic Resources for Newbie Verilog and VHDL People and it would be great to see some excellent recommendations in there!


I was not aware of its existence, but it's cool to know :D


Sun Jul 04, 2021 8:50 am
Profile

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
I am really hung up on how to implement memory wait states.

I would like to support async memory, sync memory (block RAM) and slow memory like SPI memory or maybe even a serial link. And I would like the option to be able to share that memory between the data and program buses, without the CPU needing to know that's happening, that would just be up to the memory subsystem to arbitrate the memory appropriately.

So I would like the memory to decide if it's busy and when it's done with a request. I can use a ready signal from the CPU to tell the memory the address is valid, and then I can use a valid signal for the memory to tell the CPU the data is valid (or the data was written). Is this the standard way to do this? Is there something more standard?

I am thinking for the data memory, probably the best solution is to make the microcode more flexible to accommodate multi-cycle reads and writes to data memory. There is stall logic, but it gets complicated to use it because the response may come back from the memory when stalled, and accounting for that turned into a complicated state machine that the microcode is supposed to obviate the need for, so it's probably cleaner to just implement it there.

So a memory cycle would set the address (and data on write) and the ready signal in the first cycle, then in the next cycle it would either wait if valid was low (keeping address / write data stable), or copy the data into a register / move on if the valid line goes high. I think writes would still be a single cycle, though if the memory needs to wait for the write to finish it could. Does that sound sane? Is that how it's typically done?

For the program memory, if a fetch needs to happen, it would raise the ready signal. If memory isn't ready and so the valid signal doesn't go high, I think it's okay to just stall in that case. If a data read is happening, it won't try to fetch until the read cycle is over. In the case of a write, it might stall one cycle for the write to complete, but I don't think it will interrupt the write. But I would have to ensure the stall doesn't take effect until the next cycle. Does that all sound sane / typical?


Tue Jul 06, 2021 5:38 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1782
I know the Z80 uses MEMREQ (and IOREQ) - might be worth a look at how that works.

I don't like the names 'ready' and 'valid' for this though. 'request' and 'acknowledge' feel a better fit.

(In pipelining, at Inmos, we used to use 'valid' and 'hold' which is very like 'valid' and 'ready'. But the point in pipelining is to send correct data once, and once only, to a consumer which has capacity to take it. It feels to me like memory is different, at least for reads: we send an address, and we get back some data.)


Wed Jul 07, 2021 9:27 am
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
+1 for "request" and "acknowledge", or just "req" and "ack".

Consider whether your req signal is going to be a momentary pulse (in which case can you guarantee that the memory subsystem won't miss a pulse?), or is it going to remain asserted until the ack comes in (in which case can you guarantee that the memory subsystem won't start a second transaction in the time it takes you to remove the req?)

A pattern I've seen in the FPGA world is to use a req signal that simply flips state to initiate a request. A cycle is complete when the acknowledge signal matches the req signal.

Code:
             1st request                   2nd request
               ________________________________
Req: _________|                                |______________________

                      1st ack                              2nd ack
                       ________________________________________
Ack: _________________|                                        |_________


It solves both the problems I just mentioned, and also works when you have split (but integer multiple, so still edge-aligned) clocks, where a momentary pulse on one clock might be too narrow to be seen from another. (It's common to have an SDRAM controller running at a multiple of the system clock speed.)


Wed Jul 07, 2021 10:09 am
Profile

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
I like the names req and ack, that's much better than ready/valid. Thanks!

I thought I would try out the state flipping idea. The req signal for data reads would likely come out of a microcode rom, so might be unstable. This circuit seems to clean it up okay:

Attachment:
state_flipping_req.gif
state_flipping_req.gif [ 400.76 KiB | Viewed 1393 times ]


I am using 2 bit counter with the second bit simulating the microcode rom read request. The rom should be stable by half way through the cycle, so it's registered then. But in order to do the state flip I need a pulse, so I convert it into a pulse. I then use a mux to flip the state immediately and have it remember that state at the next clock. I think this should work?

I think this seems a bit overkill / over-engineered at the moment. I think I understand the usefulness for cross clock-domain requests. I think it would also be useful if more than one request was allowed to be in flight at the same time, and it's important to catch them all.

But I think if there's only a single clock domain, and all signals are synchronized to the clock, and combinatorial loops are avoided, then I don't think it's necessary to flip the signals on state changes. I am willing to be convinced otherwise though. I am pretty green on this sort of stuff.


Wed Jul 07, 2021 12:29 pm
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
rj45 wrote:
I think this seems a bit overkill / over-engineered at the moment. I think I understand the usefulness for cross clock-domain requests. I think it would also be useful if more than one request was allowed to be in flight at the same time, and it's important to catch them all.


Yes, indeed - it's a solution to a problem you haven't had yet! (And just to be clear, this idea doesn't help with crossing unrelated clock domains - at least not without the usual synchronisation - its utility is when the clocks are integer multiples of each other and still edge aligned.)

Quote:
But I think if there's only a single clock domain, and all signals are synchronized to the clock, and combinatorial loops are avoided, then I don't think it's necessary to flip the signals on state changes. I am willing to be convinced otherwise though. I am pretty green on this sort of stuff.


Well for 832 I used a req signal which goes high, and remains high until an ack pulse comes in.
You just have to make sure you guard against this situation:

Code:

             ___________________
Req  _______|                   |___________
              <- Memory transaction starts here
                            ____
Ack  ______________________|    |___________
                             <- Another memory transaction starts here since req is still high


In 832 I have a combinational "and" of my req signal with the inverse of the "ack" signal to make sure it's deasserted immediately.


Wed Jul 07, 2021 10:13 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
In my system another request won't start unless ack is low. That way there is a wait for the slave to be able to respond in addition to the master. There is effectively a two-way waiting system. It does mean that back-to-back accesses cannot be done without a dead space between them. But, burst access may be used which supports back-to-back accesses with no dead space.
It is interesting because with the two-way waiting request and acknowledge become longer signals as they travel through bus bridges.

_________________
Robert Finch http://www.finitron.ca


Thu Jul 08, 2021 4:27 am
Profile WWW

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
robfinch wrote:
In my system another request won't start unless ack is low.
That way there is a wait for the slave to be able to respond in addition to the master.


That's also how things work in my Arise-v2.

I implemented a simple fsm for the handling this, and everything is synchronous.
"ack" is acquired on the down-clock edge, and 'ready' changes in the next down-clock edge.


Thu Jul 08, 2021 9:02 am
Profile

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
New video up. This time I just recorded it without audio, sped it up and talked over it. It's a refactor episode where I simplify the instruction set architecture.... I keep changing it, but I wanted to make room for the immediate instruction and the previous ISA was designed to have a register as the program counter so I didn't need jump instructions and I changed my mind about that. I also figured out a way to pack the opcode bits better so there's a spare bit now in case I might need it in the future. I also made it so it's exactly 32 instructions.

Not the highest quality video because I couldn't edit the audio too much without losing sync with the video. One day I will figure out how to do that better. I just re-watched it a few days later and it's not too bad I don't think, so .... well.... I might as well just put it up.

[038] Instruction Set Simplified! https://youtu.be/vsskigkJ-oM

I wanted to release a different video this week and put this video as an extra unlisted video, but alas it took me all day yesterday to record it and I didn't have time to edit it.

It was the video where I convert the microcode to multi-cycle and implement the req and ack signals for multi-cycle memory ops. But yeah, I will release that next week I guess. It doesn't have the req fix in the previous couple posts to prevent duplicate memory requests, but the memory circuit is too simple to need it yet. When I implement a proper memory controller with DMA I will likely need that fix.


Sun Jul 11, 2021 3:59 pm
Profile

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So, I finally had time to edit up this episode and get it ready to release.

This implements a microcode sequencer to allow instructions to take multiple cycles to execute. I went with an implementation that reserves the first 32 microcode ops for the instructions, then in the upper 32 ROM addresses, if an instruction requires more micro ops, it can jump there and continue to execute. While executing, the stall line is inhibited from working, so stall only happens between instructions. Also, skip will skip over a multi-cycle instruction properly.

I also implement flags, so the microcode can optionally jump to either of two microcode addresses depending on those flags. One of the flags is the memory ack signal talked about in previous posts. So if that signal is false, the microcode can loop waiting for it to become true.

And so, now the CPU can wait for memory to become ready. When requesting to read or write, a req line goes high, and then it can wait until the ack line goes high. This will allow slow devices to be mapped into data memory, and can be waited on.

One interesting use case might be waiting for the VDP to be done with video memory because it's in a vblank.

Also, it should be possible to implement interrupts fairly easily now. I have a signal that tells me that the CPU is about to execute a new instruction, and it loads the fetched opcode. I could instead jump to a special microinstruction if the interrupt line is high and do a jump to an interrupt vector instead.

[039] Multi-Cycle Microcode! https://youtu.be/H41UohXvvlE


Sat Jul 17, 2021 5:16 pm
Profile

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So, I have been taking a bit of a break from recording and editing. But I thought I would at least record a CPU Tour. I just go over everything I have built so far.

[040] CPU Tour! https://youtu.be/XxMg9fByz5A

In the meantime I have been learning how old console systems render tile-based graphics and sprites which has been pretty fun. The videos I did on the VDP / GPU have been by far my most popular videos (one has over 1000 views already!) so I would like to make maybe a separate series building and detailing how that works.


Mon Aug 02, 2021 5:41 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 237 posts ]  Go to page Previous  1 ... 12, 13, 14, 15, 16

Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software