Last visit was: Wed May 18, 2022 10:16 am
It is currently Wed May 18, 2022 10:16 am



 [ 237 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13 ... 16  Next
 rj16 - a homebrew 16-bit cpu 
Author Message

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Okay, yeah... for synthesizing a microcode ROM into LUTs, I think the synth tools can do that rather well and simplify the resulting logic. Digital can do that too if I so choose.

ijvm is very insteresting, but it's a stack architecture. I am pretty committed to the RISC architecture at this point. But it has good ideas on future steps, and particularly the mic-2 is partially pipelined in the same way I want to partially pipeline. I will see if I can find more details on the fetch unit and study that. Thanks for the pointer!

And yeah, I guess I will stick to a more traditional approach to microprogramming. The clipper idea is interesting but probably complicated to build.

Also, oldben, I am definitely using some GALs if I build this in discrete logic. For my 8-bit Ben Eater cpu I used some old GAL16v8's I found in my stash for the shift and logic functions. I could only do two bits at a time per GAL, so there was 4 of them, but it worked great. It just doubled the current draw of the CPU so I upgraded to some newer ATF16v8s, and that helped a bit... still pretty power hungry though.

Okay, another question for you all: skipping the next instruction vs conditional branches -- which one would be better? I only have 32 instructions and I would really like a conditional move instruction but it just can't fit anywhere. And I have jump, jal (jsr), brt (branch if T) and brf (branch if !T) for my branching instructions... I could remove brt and brf if I make the comparison instructions skip the next instruction. And then I get conditional move, plus a whole lot more, I could have conditional adds or anything else too. And I free up two instructions for something else.

What do you think? Is there any problems to an architecture that skips the next instruction instead of having conditional branches? I know some architectures predicate every instruction (like the OPCs or ARM) but I don't have enough bits in the encoding to do that.


Tue Apr 13, 2021 12:25 pm

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 84
rj45 wrote:
What do you think? Is there any problems to an architecture that skips the next instruction instead of having conditional branches? I know some architectures predicate every instruction (like the OPCs or ARM) but I don't have enough bits in the encoding to do that.


I implemented something similar for 832 - because I only have 8-bit opcodes I didn't have enough encoding space for a full set of conditional branches (or any branches at all, as it happens - I manipulate r7, the program counter, instead.)
I added a "cond" instruction, which causes instructions to be ignored until either the next cond instruction or until program flow changes. I used this instead of conditional branches, but with care you can use it for "if / else if" scenarios too:

Code:
  li 16
  sub r1
  cond SLT // was the result strictly less than zero?
    .lipcrel branchtarget1
    add r7  // r7 is the program counter, so this is a branch
  cond SGT // was the result greater than zero?
    li 24
    mr r2  // move 24 to register 2
  cond EX // return to unconditional execution.


I have EQ, NEQ, SLT (strictly less than zero), LE (less than or equal to zero), SGT, GE, EX (always execute) and NEX (never execute - pauses the CPU until an interrupt, or in dual-thread mode, the other thread pokes it with a "sig" instruction.)

This didn't cause me huge amounts of grief - I just needed to account for it in the hazard logic. To keep things simple, instead of attempting to do anything fancy in the fetch logic to step over skipped instructions, I convert them to NOPs but otherwise let them keep their place in the instruction stream. Where it did trip me up was that the NOPing happens after the instruction enters the ALU, so when I implemented results forwarding I was occasionally forwarding the result of a skipped instruction back into the ALU's input when I shouldn't have been. That took a while to figure out!


Tue Apr 13, 2021 5:05 pm

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
I don't know, but that's for sure a CPU I wouldn't like to program!


Tue Apr 13, 2021 5:21 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1690
Nice one @robinsonb5 - I like the way that various quite different ideas can be used to make something which works. The two things to worry about, I was thinking, are the performance (in terms of cycle efficiency) and difficulty of implementation (including validation, of course.) But @DiTBho reminds me that difficulty of use is also of interest: whether difficulty for the assembly language programmer or the compiler writer. I suspect it's difficult to measure difficulty until you've had some experience with the machine: otherwise you might be measuring unfamiliarity instead.

So, @rj45, it might well be worth looking into skip instructions, to see how it works out. As ever, I'd think it's easiest to explore in an emulator written in a suitably productive high level language.

(Commonly, early programmable calculators would offer a skip instruction. So this isn't a new idea for me.)


Tue Apr 13, 2021 5:45 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Here is the current conditional implementation, note that brt and brf are the only conditional instructions, and eq simply sets the T bit:

Code:
fibonacci:
  move r0, 0
  move r1, 0
  move r2, 1
  .loop:
    add r2, r1
    move r3, r2

    eq r3, 10946
    brt fibonacci

    add r1, r2
    move r3, r1

    eq r3, 10946
    brt fibonacci

    jump .loop


If I switch to skip instructions, this is what it would look like, and if.eq simply sets the "skip" bit instead:

Code:
fibonacci:
  move r0, 0
  move r1, 0
  move r2, 1
  .loop:
    add r2, r1
    move r3, r2

    if.eq r3, 10946
    jump fibonacci

    add r1, r2
    move r3, r1

    if.eq r3, 10946
    jump fibonacci

    jump .loop


So for efficiency, it's the same. But now any instruction can be conditional with an 'if' instruction before it.

I am not sure I like the idea of making whole blocks conditional... that sounds like it could cause trouble. But I think skipping only the next instruction would be okay. I would, though, need to skip any prefix instructions too, so there is a bit of complexity there.

Also, currently because the T bit is also the carry flag, you can use addc and branch on carry easily. With this change, you'd need an extra instruction to test if the result carried. But the compiler I am thinking of using (LCC) won't make use of addc into a branch anyway, and I don't think it would generate code that tests for carry in this way either. But I think it can make use of skips.


Tue Apr 13, 2021 6:23 pm

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 84
DiTBho wrote:
I don't know, but that's for sure a CPU I wouldn't like to program!


LOL - it's certainly "quirky", and sometimes feels as though I'm directly writing in microcode - but luckily I can program it in C when I use it in real projects.

rj45 wrote:
I am not sure I like the idea of making whole blocks conditional... that sounds like it could cause trouble. But I think skipping only the next instruction would be okay. I would, though, need to skip any prefix instructions too, so there is a bit of complexity there.


I'll be honest, skipping blocks was less useful than I thought it would be, and in practice the compiler backend only ever uses it to implement conditional branches. As long as you can implement branches in a single instruction then a single-skip is fine. (In my case, I needed to be able to load an immediate - six bits at a time - and then add, so skipping one instruction wasn't enough.)


Tue Apr 13, 2021 8:05 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
So that did it!

Two changes that made all the instructions I want fit in 32: the PC is now an accessible register, so jump and add merge into one instruction. And the if.cc instructions remove the conditional branches, and I could pull the condition code out of the op code and into the instruction, replacing 5 instructions with two.

The not and neg instructions aren’t necessary, so if there’s better candidates I am all ears. It feels nice to only have 30 instructions and be unsure what to add to make 32! I suppose I could add multiply and/or divide, but that would be tricky to implement in microcode. The fpga has a dsp block, but I don’t want to rely on that.

Code:
 
load, store
imm, jal
move, add
if.cc, ifc.cc
shl, shr, asr, ror
and, or, xor, sub
addc, subc
loadb, storeb, loadp, storep
rcsr, wcsr
zext, sext, not, neg
nop, break, halt, ret


So the imm prefix is not interruptible, hopefully that will make it easier to skip both the prefix and the prefixed instruction as I will already have a control line to suppress an interrupt. So I shouldn’t need to skip blocks, just don’t clear the skip flag if the instruction is uninterruptible. I suppose the skip won’t be interruptible either, which means you could chain if.cc instructions which might be interesting.

And yeah I can fit a load/store of program memory now. Byte access can only be done on data memory though, program memory will be word addressed. I plan to have strings and constants copied into data memory by the boot loader/kernel.

What do you think, am I missing anything vital for compiling from C?


Thu Apr 15, 2021 3:51 am

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 84
rj45 wrote:
So that did it!


Nice one!

Quote:
So the imm prefix is not interruptible, hopefully that will make it easier to skip both the prefix and the prefixed instruction as I will already have a control line to suppress an interrupt. So I shouldn’t need to skip blocks, just don’t clear the skip flag if the instruction is uninterruptible. I suppose the skip won’t be interruptible either, which means you could chain if.cc instructions which might be interesting.


Indeed - where things get mind-melting is when the instructions which might or might not be skipped can themselves affect the condition flags!

Quote:
And yeah I can fit a load/store of program memory now. Byte access can only be done on data memory though, program memory will be word addressed. I plan to have strings and constants copied into data memory by the boot loader/kernel.


That should work nicely.

Quote:
What do you think, am I missing anything vital for compiling from C?


For C code you're going to be accessing the stack quite a bit, so efficient stack manipulation - while not vital - is nice to have.
Therefore, if you've got encoding space, I'd recommend store-with-pre-decrement, load-with-post-increment, and load-indexed.
Store-indexed is also nice to have, but I wasn't able to add it easily to 832 because it means having three values "in flight" at once, and I have no provision for more than two.


Thu Apr 15, 2021 7:40 am

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Next video is up. I implement the changes from a few posts ago where I figured out how to easily get FPGA block RAM working by inverting the clock. While I am at it, I pipelined the fetch and injected a nop into the branch delay slot / shadow. Looks pretty easy, it's just a multiplexer and a register.

I am not sure about the timing aspect, since now memory addresses have to be calculated by mid-cycle for both memory access and jumps, and that may mean there's nothing to do but twiddle thumbs in the second half of the cycle, but I can fix that later.

Another thing I do near the end is extract the CPU out of the front panel and into a separate embedded circuit, which can be more easily converted to verilog for running in the FPGA. Instead of keeping that footage in the main video (which is already too long) I just extracted it out mostly unedited into a separate unlisted video which is linked in the description. I doubt anyone wants to watch that, but it's there if you want to.

[025] FPGA Block RAM! https://youtu.be/B9fruZw2bLE

Next steps: I would like to implement the new instruction set, but I just did 3 videos on doing exactly that not long ago. I might do the same thing where I summarize the work in the main video and fork off an unlisted video with the busywork of refactoring everything to make that work. There may be some other odds and ends for making it work in an FPGA I need to do. And then on to the second try at implementing microcode.

This time I think I am not going to programmatically convert the existing logic to microcode, I think I am going to manually / incrementally convert the logic piece by piece into microcode instead, and explain better what I am doing as I go. I think that will work out a lot better.

Still interested on thoughts on the instruction set from my previous post. I do have indexed memory access, so stack access should be covered. I might try to fix up the LCC port I have just to make sure there's nothing I forgot (if I have time, since I won't record that and hobby time is limited). I think it should be good though.


Thu Apr 15, 2021 4:28 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Another video is up, this one is about the changes to the instruction set I have talked about in the last couple posts. I don't do the rework on camera, because well, it took me three hours, and I just spent three episodes implementing the instruction set. So I just talk about what I am going to change and why, then talk about how that turned out and what I actually changed.

[026] Instruction Set Upgrades! https://youtu.be/a-NgJahHiZ0

The next video is also recorded where I implement the skip functionality. It turned out much simpler than I thought it would, but mind you, it doesn't yet skip the immediate prefixes. But I don't think it will add that much complication to implement that.

After that I need to make the program counter a "general purpose" register. I am guessing the best way to implement that is to move the PC into the register file? That way it can be selected by the muxes there? It would, of course, need a separate write port for the next PC value, and writes will have to simultaneously also go to the program memory address. But I think this is how it's implemented in the ARM. Does anyone have a better way to do it?


Sun Apr 18, 2021 10:13 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1690
I think it's normal enough for the PC to be physically a separate register, with its own private incrementer, but for it to be addressed logically as if it were part of the register file. I think both ARM and OPC did this.


Mon Apr 19, 2021 7:47 am

Joined: Sun Dec 20, 2020 1:54 pm
Posts: 74
GPs are involved with the ALU unit
SP, EA and DR are involved with the load/store unit
PC is involved with the BR unit

In my opinion, and for my personal taste, a design that aims for being elegant can be hugely rewarding if it rewards purity and rigor by keeping circuits and registers separated.

It makes programming like sliding your fingers on the silk. I love it :D


Mon Apr 19, 2021 10:03 am

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Okay, so I guess, then, I could detect reads and writes to the PC in the decode unit and convert writes into jumps, and intercept reads with a mux that injects the PC value from the fetch unit then? I suppose that just makes it part of the instruction decoding.

As for programming purity, my goal is to have a C compiler do all my assembly programming for me. But LCC at least, does treat the PC, return address (aka link register) and SP completely separately with entirely separate code generation for them. In that sense, then, it doesn't make sense to have them in the register file as it just causes more register spills because that's 3 less registers for general purpose use.

But well... with only 32 instructions as my self-imposed limit, I don't have the opcode space to have separate instructions for all those classes, and encoding them as registers gives me more expressive power for the 32 instructions I have. ISA design is a lot of annoying and difficult to stomach compromises to make everything fit.


Mon Apr 19, 2021 12:31 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1690
If I have the right picture, in the OPC, branches and jumps are expressed and implemented as arithmetic with the PC as destination, but the register file has the PC as a discrete register beside it, and it's a matter of decoding the read and write accesses accordingly.

In other words, the special nature of the PC isn't visible in the instruction decoder, but in the augmented register file.

The trick is that a special register can be addressed as if it is in the register file but implemented separately.


Mon Apr 19, 2021 12:35 pm

Joined: Sat Nov 28, 2020 4:18 pm
Posts: 123
Ah okay, so the muxes in the register file select and enable write of the PC, but those control signals go out to the fetch unit where the PC register actually lives. That’s kinda what I thought but for some reason I was thinking I needed to move the register. This makes more sense, thanks :)


Mon Apr 19, 2021 7:32 pm
 [ 237 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11, 12, 13 ... 16  Next

Who is online

Users browsing this forum: CCBot and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software