View unanswered posts | View active topics It is currently Thu Mar 28, 2024 4:54 pm



Reply to topic  [ 108 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8  Next
 One Page Computing - roll your own challenge 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
I notice that revaldinho has used CMPC as part of a multi-word compare, in at least five routines:


Mon Sep 23, 2019 12:54 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
BigEd wrote:
joanlluch wrote:
... my question is particularly about the CMP and CMPC instructions of the OPC6 and OPC5. Are they supposed to be chained to perform longer-than one word comparisons?

I think there might be some subtlety, or some clever tricks, so I posted a related query over on 6502.org. (Bytes on 6502 will behave very like words on the OPC machines. But the 6502 lacks predication, and only has CMP with no carry in. The 6502 does have a V flag. So, not exactly the same situation!)

For processors without a CMPC instruction, based on what I learned from the LLVM compiler, the usual procedure is to compare the higher words first, and depending on the result and the actual condition being tested, the lower words are compared next (or not). So there's actually no carry or chained subtraction, but just normal comparisons performed on the higher and lower words. For example for equality, if the first comparison returns non equal then the second comparison is not performed, and so on.


Mon Sep 23, 2019 8:08 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
BigEd wrote:
I notice that revaldinho has used CMPC as part of a multi-word compare, in at least five routines:


Thanks for that.

The usual combination of flags to detect all conditions after a comparison (or subtraction) for processors featuring all the N, Z, C, V flags is this: (Taken from the ARM processor)

Code:
eq   Equal.   Z==1
ne   Not equal.   Z==0
cs or hs   Unsigned higher or same (or carry set).   C==1
cc or lo   Unsigned lower (or carry clear).   C==0
mi   Negative. The mnemonic stands for "minus".   N==1
pl   Positive or zero. The mnemonic stands for "plus".   N==0
vs   Signed overflow. The mnemonic stands for "V set".   V==1
vc   No signed overflow. The mnemonic stands for "V clear".   V==0
hi   Unsigned higher.   (C==1) && (Z==0)
ls   Unsigned lower or same.   (C==0) || (Z==1)
ge   Signed greater than or equal.   N==V
lt   Signed less than.   N!=V
gt   Signed greater than.   (Z==0) && (N==V)
le   Signed less than or equal.   (Z==1) || (N!=V)



In the case of the OPC processor, I think that as long as only the C flag is looked at, the result is always good without using predication. All the other cases are trickier (at least for me), as must be solved with predication. I suppose that snippets of code could be put in some table, covering all the cases above, so that this only needs to be thought once...

About practical examples, when testing for equality, the CMPC I think that is predicated with z, so that the second comparison is only performed if the first one returned equal (as both must be equal). This particular one seems easy enough. However, cases where both the Z and C must be tested are complicated.

On the "Sieve" example after label L3, there's a comparison without predicates that seems to check for non-equality. However, it looks to me that the second comparison overwrites the result of the first one, so I'm unsure about what's the intended purpose of that particular one. I mean, what if the CMPC gives 'equal', but the CMP was not equal? In such case the global comparision should return non-equal, but the actual result on the code is equal (??)

On the other hand, I wonder if a clearer procedure would be to compare first the high words instead of the low ones, then check the lower ones only if needed in accordance to the actual comparison. For instance, if we are checking for 'unsigned higher or same', we only need to test the lower words if the higher words are equal.


Mon Sep 23, 2019 9:14 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Btw,
I think that, in the OPC processor, just by adding the following code to handle the Z flag by the CMPC instruction, comparisons would get simplified a lot.

Code:
Z = inZ and opZ
Where:

inZ, is the input Z before the CMPC instruction,
opZ, is the actual Z produced by the instruction.
Z, is the resulting Z that goes to the status register.

By implementing this, I think that a CMP, CMPC pair (or a CMP followed by any number of CMPC) could be chained together and the correct Z status flag will result at the end. The C flag will be already correct as the result of the last CMPC, same for N. The V (if it was available) would be correct too.


Mon Sep 23, 2019 9:38 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
I quite like the idea of chaining the Z flag. I'm not aware of any other machine offering that - perhaps we need to write up a patent! Oh wait, we've already disclosed the invention...


Tue Sep 24, 2019 8:55 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
BigEd wrote:
I quite like the idea of chaining the Z flag. I'm not aware of any other machine offering that - perhaps we need to write up a patent! Oh wait, we've already disclosed the invention...

I think, the AVR processors must do that already, or something similar. At least based on the description and example of the CPC (compare with carry) instruction from here https://www.microchip.com/webdoc/avrassembler/avrassembler.wb_CPC.html with excerpts quoted below:

Quote:
This instruction performs a compare between two registers Rd and Rr and also takes into account the previous carry. None of the registers are changed. All conditional branches can be used after this instruction.

Code:
; Compare r3:r2 with r1:r0
cp r2,r0 ; Compare low byte
cpc r3,r1 ; Compare high byte
brne noteq ; Branch if not equal
...
noteq:nop ; Branch destination (do nothing)


The code example shows a "BRNE" instruction just after the "CPC". The BRNE instruction checks for the Z flag and that's supposed to work (I understand) after the chained comparison instructions. I have seen similar patterns being generated by the LLVM compiler for the AVR architecture, and not only with a single CPC but multiple ones. So the CPC must have some special treatment of Z flag, possibly along the lines of what I proposed earlier. However, I didn't find any docs or explanation about this particular subject (or feature), either in the context of the AVR processors or as a general concept. So who knows what's really going on here....


Tue Sep 24, 2019 12:02 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Interesting!


Tue Sep 24, 2019 12:14 pm
Profile

Joined: Tue Apr 25, 2017 7:33 pm
Posts: 32
I have updated OPC7 (the 32 bit machine) to add an overflow flag.

The docs are updated now, but basically the new flag is added in the Processor Status Register as the MSB, making it nominally a 32 bit word now (actually many bits always return zero).

There's not enough space in the predication bits to allow predication directly on this new flag. To inspect it you need to load the PSR into a register and then check the sign flag, which will have been updated by the state of the PSR's MSB.

In OPC7 you can do this non-destructively by loading the PSR into the R0 register - ie the result is thrown away but the sign flag will be updated correctly.

So something like this:

Code:
 
     sub    r1, r2
  nc.mov    pc, r0, exit_borrow_set
     getpsr r0, PSR
  mi.mov    pc, r0, exit_overflow


I ran the regressions and this change didn't break anything else. I will probably retrofit it now to OPC6 (16-bit) and OPC8 (24-bit) machines. The amount of code was small and I seem to have already broken the 1 page (66 line) rule slightly in a few places anyway.


Tue Oct 01, 2019 6:35 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
This is a nice way to extend the machine without a radical overhaul! Even though checking the V flag isn't trivial, it's a lot easier than the workarounds needed otherwise.


Tue Oct 01, 2019 6:43 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Revaldinho:

I have been intrigued by your OPC processor's use of predication. One thing that I've wondered about recently in regards to my own processor in the "Roll your own challenge" is the simple, single flag branch instructions. It seems to me that the flags themselves are not as important as the result of a test / comparison. In other words, like the RSRE's Viper microprocessor, I am considering using a single flag which is set according to a select code. The code defines the standard conditions: {EQ, NE, LT, LE, GT, GE, LO, LOS, HI, HIS}. These 10 conditions would set the flag based on the ALU results and would require multiple signals: {N, V, Z, C}. It can also be set by the arithmetic and shift/rotate instructions. Therefore, the single flag could be represent the result of a comparison, or an ALU operation result.

I may have missed something in your implementation, but you appear to have a number of simple flags, and a matching set of selects in your predicate fields in your OPC processors. Perhaps instead of predicating on the basis of the state of the simple ALU flags, {NVZC}, maybe your predicates could select combinations of flags. That would let your predicate codes select complex combinations like those required to implement some of the signed and unsigned tests that require complex combinations of two or three of the standard ALU flags.

_________________
Michael A.


Tue Oct 01, 2019 11:06 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Regarding flags and ways to perform conditional execution, I found there are basically two approaches:

(Approach 1) The compare instruction is quite generic. It performs a subtraction without setting a result, and it updates condition flags in a special Status Register. Conditional instructions instructions such as conditional jumps, or conditional moves, test a particular condition against the Status Register, in order to perform conditional execution. This concept can be extended further with predication. In this case, all instructions become conditional instructions.

(Approach 2) The compare instruction incorporates the actual condition to be tested. There is no Status Register. The result of the comparison is set as a True or False value in a general purpose register. Conditional execution instructions check for the True or False value contained in a register to perform conditional execution. Given enough encoding width, the compare instruction can even be avoided by folding the condition and the testing registers in the conditional instruction.

Most real processors use approach 1, I assume for historical reasons. Academically designed or experimental processors tend to use approach 2. The RiscV and Mips processors do not even have a compare instruction.

One advantage of approach 1, is that it takes less encoding space and that a single comparison instruction or a flags setting arithmetic instruction, can be used by more than one conditional instruction even with different conditions. My compiler for the CPU74 architecture takes advantage of this.

Now, I can think on a third approach that is a mix of the two, which may be suitable for the OPC processor. Let's suppose that we have a special bit in the status register that stores the result of a comparison instead of the flags. In other words, the compare instruction specifies the condition to be tested, and it sets whether the result was True or False instead of setting 4 different flags. Maybe adding the carry as an independent flag might be useful too. So in total we would have Two flags in the Status Register. (Condition flag, and Carry flag).

Since the OPC processor sets flags for all instructions, including moves and arithmetic instructions, the Condition flag can be interpreted as a Zero for these instructions, and still the Carry will be available for the instructions requiring it.

In this scenario, predication can be reduced to just 1 bit, and still keep full flexibility. The only constraint is that the cmp instruction needs to be encoded in a way that incorporates the condition to be tested, but maybe this is still possible by splitting all possible conditions in two cmp instructions (lets say comparesinged and compareunsigned) given that we are freeing 2 encoding bits in ALL instructions. Note that with this approach only the "equal", "greater than", and "greater than or equal", conditions are really needed for the new compare instructions, because the operators can be reversed for the reversed conditions.

Joan


Wed Oct 02, 2019 7:46 am
Profile

Joined: Tue Apr 25, 2017 7:33 pm
Posts: 32
MichaelM wrote:
Revaldinho:

I have been intrigued by your OPC processor's use of predication. One thing that I've wondered about recently in regards to my own processor in the "Roll your own challenge" is the simple, single flag branch instructions. It seems to me that the flags themselves are not as important as the result of a test / comparison. In other words, like the RSRE's Viper microprocessor, I am considering using a single flag which is set according to a select code. The code defines the standard conditions: {EQ, NE, LT, LE, GT, GE, LO, LOS, HI, HIS}. These 10 conditions would set the flag based on the ALU results and would require multiple signals: {N, V, Z, C}. It can also be set by the arithmetic and shift/rotate instructions. Therefore, the single flag could be represent the result of a comparison, or an ALU operation result.

I may have missed something in your implementation, but you appear to have a number of simple flags, and a matching set of selects in your predicate fields in your OPC processors. Perhaps instead of predicating on the basis of the state of the simple ALU flags, {NVZC}, maybe your predicates could select combinations of flags. That would let your predicate codes select complex combinations like those required to implement some of the signed and unsigned tests that require complex combinations of two or three of the standard ALU flags.


Most of the odd decisions in the OPC machines come down to codesize - it's a challenge to get assembler/emulator/verilog and spec in a single page of code, which was the original plan. I thought that keeping things regular was the key to this and that's what you see in the instruction tables. Predication is part of that because all instructions are predicated, always. I did only allocate 3 bits for the predication field which doesn't allow for many combinations especially since an 'always execute' code is required too. Moving to 4 bits would allow the same combinations that ARM uses.

joanlluch wrote:
Now, I can think on a third approach that is a mix of the two, which may be suitable for the OPC processor. Let's suppose that we have a special bit in the status register that stores the result of a comparison instead of the flags. In other words, the compare instruction specifies the condition to be tested, and it sets whether the result was True or False instead of setting 4 different flags. Maybe adding the carry as an independent flag might be useful too. So in total we would have Two flags in the Status Register. (Condition flag, and Carry flag).

Since the OPC processor sets flags for all instructions, including moves and arithmetic instructions, the Condition flag can be interpreted as a Zero for these instructions, and still the Carry will be available for the instructions requiring it.

In this scenario, predication can be reduced to just 1 bit, and still keep full flexibility. The only constraint is that the cmp instruction needs to be encoded in a way that incorporates the condition to be tested, but maybe this is still possible by splitting all possible conditions in two cmp instructions (lets say comparesinged and compareunsigned) given that we are freeing 2 encoding bits in ALL instructions. Note that with this approach only the "equal", "greater than", and "greater than or equal", conditions are really needed for the new compare instructions, because the operators can be reversed for the reversed conditions.

Joan


I like this idea, but looking through the test cases, I think I still find immediate access to the Z and C flags to be handy. So, without generating a load of stats for a moment, I think that they would be good to keep, but that the S flag/predicate could be replaced with the CMP result. Also if making a specific flag to be set by a cmp instruction I think it should be sticky. One of the problems with the cheap predication in OPC is that you can't execute a short stream of instructions based on a flag result without using a branch, because the flags update through that stream. If cmp set a sticky flag then you could continue to predicate instruction execution on that flag until getting to another cmp instruction.

OPC8 (the 24-bit machine) is still a work in progress, so maybe an experimental branch to try this out is the thing to do.

Rev.


Wed Oct 02, 2019 8:16 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(Just to note that ARM allows instructions not to update the flags, which does allow for a stream of predicated responses.)

Also I think RISC-V will be notable not because it's academic, but because it has been carefully thought about and is the 5th attempt at RISC. It does, hopefully, embody choices which are good for implementation and good for compilers. Ideally it's good for small simple implementations as well as large complex high performance implementations. I think this is why it lacks a status register.

But as Rich notes, the one page aspect to OPC blocks off certain avenues of exploration.


Wed Oct 02, 2019 10:34 am
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Thanks for the reply.

I forgot about one of your driving objectives: a one page description. On the other hand, using a 6 point font, or even a 4 point font, may help you fit more on a page. Another alternative is too redefine the page size to something like that of a C / D page. :D

In any event, the OPC designs provide great food for thought. Thanks for sharing. :)

_________________
Michael A.


Thu Oct 03, 2019 1:06 am
Profile

Joined: Tue Apr 25, 2017 7:33 pm
Posts: 32
Thanks Michael, I know the one-page limit can seem rather strange but this constraint has been interesting to work with. The need for simple regularity in the instruction set is a feature in all the machines and just keeping a lid on the complexity has let us build a number of machines all fully supported with docs, macro-assembler, emulator and synthesizable RTL in very short order. That has been a lot of fun, and especially seeing them physically running code either implemented on FPGA boards or in the PiTubeDirect emulators on the BBC Micro. They are nothing like being the best open-source CPUs around of course (and Ed's example of RISC-V is clearly a winning choice for serious applications), but they are immediately accessible and open to experimentation in a way that more powerful but also larger and more complex designs just aren't.

Back to the predicates then and condition code checking for a mo'.

I added the 'V' overflow flag to OPC7 as described above, and then I wondered how much OPC7 assembler do you really have to write to be able to have branch-on-condition for each of the conditions quoted in Joan's table (from the ARM).

The first 6 entries in the table are already covered by OPC7's built-in predication on C, S and Z flags, so nothing to do there.

Macros for all the others are presented below

Code:
MACRO BRA_SV ( _target_ )
        #   p <= V
        getpsr  r0,psr
     mi.mov     pc, r0, _target_
ENDMACRO

MACRO BRA_SNV ( _target_ )
        #  p <= !V
        getpsr  r0,psr
     pl.mov     pc, r0, _target_
ENDMACRO

MACRO BRA_UGT ( _target_ )
        #  p <= C & !Z
      z.add     r0, r0            # clear carry if zero set
      c.mov     pc, r0, _target_ 
ENDMACRO
   
MACRO BRA_ULTE ( _target_ )
        #  p <= !C | Z
      z.mov     pc, r0, _target_
     nc.mov     pc, r0, _target_
ENDMACRO

MACRO BRA_SGTE ( _target_)
        # p = !(N ^ V)   
     mi.getpsr  r0, psr
     mi.mov     pc, r0, _target_ #  exit_n=1_v=1
     pl.getpsr  r0, psr
     pl.mov     pc, r0, _target_ # exit_n=0_v=0
ENDMACRO

MACRO BRA_SLT ( _target_ )
        # p = (N ^ V)
     mi.getpsr  r0, psr
     mi.mov     pc, r0, exit_not_SLT # exit_n=1_v=1
     pl.getpsr  r0, psr
     pl.mov     pc, r0, exit_not_SLT # exit n=0_v=0
     mov        pc, r0, _target_
exit_not_SLT:
ENDMACRO

MACRO BRA_SGT ( _target_ )
        #  p = !Z & !(N ^ V)
      z.mov     pc, r0, exit_not_SGT
     mi.getpsr  r0, psr
     mi.mov     pc, r0, _target_ # exit_n=1_v=1
     pl.getpsr  r0, psr
     pl.mov     pc, r0, _target_ # exit_n=0_v=0
exit_not_SGT:       
ENDMACRO
     
MACRO BRA_SLTE ( _target_ )
        #  p = Z | (N ^ V) 
      z.mov     pc, r0, _target_
     mi.getpsr  r0, psr
     mi.mov     pc, r0, exit_not_SLTE # exit_n=1_v=1
     pl.getpsr  r0, psr
     pl.mov     pc, r0, exit_not_SLTE # exit_n=0 v=0
        mov     pc, r0, _target_
exit_not_SLTE:       
ENDMACRO


Use these like any other macro, e.g.

Code:
    cmp         r4,r5
    BRA_SLT     ( branch_target)
    ...

branch_target:
    ...


Even though the PSR has to be explicitly read to check the new overflow flag, this addition has simplified the macros a lot. It wasn't so expensive and is a good addition to the CPU, so thanks to Joan for triggering that.

With that, many of the checks are only a couple of instructions. Others look large in terms of code size with 4-6 instructions, but depending on the way the flags fall out may only need to execute just 1 or 2 of those.

So, in the spirit of One Page Computing approach to hardware that still seems pretty good to me.

I will retro-fit the 'V' flag into the OPC6 (16-bit) and OPC8 (24-bit), but I still like the idea Michael and Joan put forward of computing a single predicate for condition checks, so I'd like to have a go at seeing if that might fit into one of the machines without breaking the OPC limits too badly.


Thu Oct 03, 2019 9:07 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 108 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8  Next

Who is online

Users browsing this forum: Amazonbot and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software