View unanswered posts | View active topics It is currently Thu Mar 28, 2024 5:32 pm



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 11, 12, 13, 14, 15, 16, 17 ... 52  Next
 Thor Core / FT64 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
MichaelM wrote:
Exception handling ... divide by 0 ... I resolved the issue by implementing saturation logic and letting the computations continue.


Garth wrote:
... In the case of a /0, the inputs remain unaffected, so you can take a different course of action with the same inputs for example to return the maximum representable number with the correct sign and keep going.


(Emphasis added.) One thing about dividing by a small number is that the sign matters. Do you want to saturate towards a large positive or a large negative value? I vaguely wonder if the use of ones-complement or sign-and-magnitude would help, or some other way to offer both positive and negative zero.


Mon Jul 30, 2018 4:20 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Didn’t realize there were more posts when I went to post.

I don’t like how complicated exception handling is either. I don’t like how lists have to be traversed to find the correct handler. But I don’t know how else it could be done.
Right now causing an exception on divide by zero is optional, controlled by a bit in the arithmetic exception control register. However, if dividing by an immediate constant no exceptions are generated.
The divide instruction could certainly be improved. It should be straightforward to set the quotient to the max magnitude on a divide by zero. It could even abort the divide early in that case.

It sounds like there might be a language element missing for retrying operations. I suppose one approach would be to put the tried statements in a loop which aborts if successful (or too many attempts at the calc are made). A ‘retry’ statement for use in a catch clause might be nice, it would allow putting the retry logic in the catch clause. ‘retry’ would act like a ‘goto’ back to the start of the try block.
Code:
for (n = 0; n < 5; n++) {
try {
   < some complicated calc>
   break;
}
catch(DBZ) {
   < possibly fix operand and>
   // retry;
}
}

In CC64 throw acts a lot like a goto the nearest handler. Throw doesn't normally raise an exception, so it should be fast.

_________________
Robert Finch http://www.finitron.ca


Mon Jul 30, 2018 5:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Been playing with the check instruction. A typical CHK instruction will cause a bounds exception if a register is outside of the bounds determined by the instruction. But I really want to use the instruction as a branch, not an exception generator, like this:
Code:
int FMTK_CheckMsg(hMBX hMbx, int *d1, int *d2, int *d3, int qrmv)
{
   MBX *mbx;
   MSG *msg;

   if (hMbx < 0 || hMbx >= NR_MBX);  // <- compiles to check instruction
      return (E_Arg);

Here the mailbox handle is checked to see if it’s in a valid range. Then an error is returned if out of range. Used as a branch there is more control available. Rather than return an error code the code could also throw() an exception. Throw() could handle additional parameters required for raising an exception beyond what CHK supplies.
The assembler code would look something like:
Code:
   ldi      r3,#NR_MBX
   chk      r1,r0,r3,.goodtogo
   ldi      r1,#E_ARG
   ret
.goodtogo:
   < continue with function >

CHK is really just another compare and branch, except this time there are two registers to compare against rather than one. (A 3R branch instead of a 2R branch). This takes up more bits in the instruction. The test is Ra >= Rb and Ra < Rc rather than a simple test like Ra > Rb.
In addition to being a branch, the CHK instruction could also be made to generate an exception rather than branch, by checking the branch displacement. If the branch displacement is -1 then CHK could act like a regular exception generating CHK. (A displacement of -1 would branch back to the CHK instruction in an infinite loop).

Just looking at the messaging code it occurs to me that maybe an error should be thrown rather than returning an error code. Returning an error code is such an ancient means of error processing. These are operating system functions. What if the app doesn’t check for error codes ? It may be better to pass an exception back to the app because that would likely ultimately at least be caught by the OS.

_________________
Robert Finch http://www.finitron.ca


Tue Jul 31, 2018 4:49 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Bounds-checking does feel to me like a good case for exceptions. As does divide by zero, in fact - because an application should not be doing such things as a matter of course, and if it's not well enough written to deal with an exception, it's not well enough written to deal with such an outlier case. Better that an application should crash, than it proceed with incorrect computation.

That said, I've never written important code which might need to deal with a divide by zero (arguably, I've never written important code!) so I don't really know the territory.


Tue Jul 31, 2018 8:00 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Bounds checking and divide by zero are definitely good cause for exceptions. The issue I see is that as instructions they try to do too much work in single instruction and end up raising an exception too simply. One could argue that any branch could be treated as an exception, and have all branches generate exceptions instead of a simple control transfer. But that’s getting a bit ridiculous.
I like to see at least an exception object and exception type passed to an exception handler. Using a throw is a better way to cause an exception. It’s a bit much to expect the CHK or DIV instruction to setup these values, so instead I propose they branch around code that does. There are sometimes other operations that need to be done in exception processing. For instance a little bit later in CheckMsg() there’s this code which unlocks a semaphore before returning an error.
Code:
      if (LockSysSemaphore(-1)) {
       // check for a mailbox owner which indicates the mailbox
       // is active.
       if (mbx->owner < 0 || mbx->owner >= NR_ACB) {   // <- compile to CHK
           UnlockSysSemaphore();
          return (E_NotAlloc);
        }

The code needs to unlock the system semaphore before throwing an exception. I suppose the exception handler could unlock the system semaphore. But relying on user code to unlock the semaphore is a bit dubious.
I was going to try and shoehorn in a branch displacement into the divide instruction to catch divide by zero errors.

I guess part of the issue is the common use of CHK as generating an exception. In order not to confuse people used to it working that way it maybe better to leave it alone and use a different mnemonic that performs almost the same operation but branches instead of exceptioning. BNBV? (Branch on no bounds violation). For divide it always possible to branch if the operand is zero before the divide instruction. The branch could be statically predicted to get more consistent timing.

I haven’t seen much code that actually uses the CHK instruction. It isn’t directly available in most HLL.
It's available in CC64 as __check()

_________________
Robert Finch http://www.finitron.ca


Tue Jul 31, 2018 3:02 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the BCHK instruction which branches if operands are in range. Added capability in the compiler to recognize when it can use the BCHK instruction.
Found a dereferencing error in the compiler in the way name constants were being handled. Sometimes they would output as immediate instead of direct mode loads.
Added branching on bit optimization. The optimization removes an ‘and’ instruction when testing individual bits in an ‘if’ statement. As in the following code:
Code:
;    if ((*thrd)->status & 1)
            lw      $v2,[$r12]
            lb      $v2,911[$v2]
            bbc     $v2,#0,FMTKmsg_133


Had to update the register bypassing in rtl code. It was only working on whole words. If two instructions updated the same register, the most recent instruction result would be bypassed to the output. However since the registers can be updated on a byte lane basis, there could be two instructions, one that updates more byte lanes than the other. So potentially a value for bypassing could be a combination of results of two instructions. So the bypass logic works on a byte-by-byte basis now. Eight times as many LOC, but very little additional logic.

_________________
Robert Finch http://www.finitron.ca


Wed Aug 01, 2018 4:01 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Finally got around to trying to optimize the size of address constants. Global variables can be referenced as an offset from the global pointer register rather than using absolute address. The offset is usually shorter than the full address. Variables coming from the read only data segment can be addressed using program counter relative addressing. But wait, the processor doesn’t support program counter relative addressing in loads and stores. RiSC-V handles this with an AUIPC instruction, which adds a constant to
the program counter then stores it in a register. Something similar to program counter relative addressing can be emulated by loading the program counter at the start of the program into a register. Then using the register in the same manner a global pointer register is used. Call this register the program pointer or program address register.
Code:
   ; The following code must run shortly after the org statement determining
   ; where code is located.
   ; Get the high order bits of the program address into the program
   ; address pointer register r55.
      jal      $r55,.st3
.st3:
      and      $r55,$r55,#$FFFC0000   ; mask off the low order bits

The assembler can assume offsets from r55 have the value in r55 subtracted from them. This value should be the same value as specified in the org directive.

Program counter relative addressing for loads and stores could be added to the processor by using one of the register codes to represent the program counter. However this adds a set of multiplexors into the register read path. Alternately a separate set of instructions could be defined, which places the multiplexor at the issue stage rather than the queue stage. This would avoid multiplexors in the register read path. Is the extra hardware worth it ? In my opinion, no.

Spent some time trying to get the compiler to expand return blocks in a manner similar to loop unrolling, if the return block is small enough. This is rather than taking a branch to the return block. For many small functions the return block is only four or five lines of code. Rather than branch to the return block at every return() statement, the return block is simply replicated. This creates code bloat but saves executing a branch instruction, making code slightly faster. I intended to add it as a speed optimization.

_________________
Robert Finch http://www.finitron.ca


Thu Aug 02, 2018 12:02 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Musing over the utility of branch prediction bits for branch instructions. It might be better to drop the bits and support a larger displacement field instead. Assuming all branches could be predicted with accuracy greater than 90%, just how useful is it to be able to specify the prediction ? I note many ISA’s don’t make use of branch prediction bits.

Okay, now started on version v5. Starting from the original (v1) again. I really wanted the compressed instruction set capability and am not fond of the 40 bit instructions of v4. So instructions for this version are going to be 16/32/48 bits. 48 bit instructions are to provide long constants and room for vector instructions. Like v3 a single bit in the instruction (bit 6 in this case) identifies a 16 bit instruction. Bits to represent SIMD precision are not being used in this version. Separate instructions will have to be used for SIMD operations.

_________________
Robert Finch http://www.finitron.ca


Fri Aug 03, 2018 6:06 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Dropped support for branching to an address contained in a register in order to reclaim the opcode. This was supported primarily to handle circumstances where the branch displacement was too large for the instruction. With the support of 48 bit instructions a 28 bit branch displacement is possible in regular branch instructions.
There are nine unused bits in the vector instructions which are 48 bits total.


I’m wondering how to clean up this little bit of code that determines what the next pc is. It’s got to be quite a few logic levels.
Code:
assign  fcu_misspc =
    IsRTI(fcu_instr) ? fcu_argB :
    (fcu_instr[`INSTRUCTION_OP] == `REX) ? fcu_bus :
    (IsBrk(fcu_instr)) ? {tvec[0][31:8], ol[fcu_thrd], 5'h0} :
    (IsRet(fcu_instr)) ? fcu_argB:
    (IsJAL(fcu_instr)) ? fcu_argA + fcu_argI:
    (fcu_instr[`INSTRUCTION_OP] == `CHK) ? (fcu_pc + fcu_insln + fcu_argI) :
    (fcu_instr[`INSTRUCTION_OP] == `BCHK) ? (~fcu_takb ? fcu_pc + fcu_insln) : fcu_pc + fcu_insln +
            ((fcu_instr[7:6]==2'b01) ? {{38{fcu_instr[47]}},fcu_instr[47:23],1'b0} :
                        {{54{fcu_instr[31]}},fcu_instr[31:23],1'b0}):
    // Else branch
         (~fcu_takb ? fcu_pc + fcu_insln : fcu_pc + fcu_insln + (
            (fcu_instr[7:6]==2'b01) ? {{36{fcu_instr[47]}},fcu_instr[47:19],1'b0} :
                        {{52{fcu_instr[31]}},fcu_instr[31:19],1'b0}
         );

A couple of things I can think of are to a) eliminate the BCHK instruction. It has a different branch displacement length resulting in an additional set of multiplexors and adders. B) pre-compute the fcu_pc + fcu_insln value and the sign extended displacement value. C) make use of a case statement to select the instruction, rather than the IsXXX functions.
Cleaned-up code:
Code:
always @*
case(fcu_instr[`INSTRUCTION_OP])
`R2:   fcu_misspc = fcu_argB;   // RTI (we don't bother fully decoding this as it's the only R2)
`RET:   fcu_misspc = fcu_argB;
`REX:   fcu_misspc = fcu_bus;
`BRK:   fcu_misspc = {tvec[0][31:8], 1'b0, olm, 5'h0};
`JAL:   fcu_misspc = fcu_argA + fcu_argI;
// Default: branch
default:   fcu_misspc = fcu_takb ? fcu_nextpc + fcu_brdisp : fcu_nextpc;
endcase

_________________
Robert Finch http://www.finitron.ca


Sun Aug 05, 2018 3:07 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
That certainly looks a lot cleaner - is it smaller and faster?


Sun Aug 05, 2018 10:52 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
It should be.
I trimmed a little bit of logic at the same time so it should be both smaller and faster. Although the original code is just a cascade of multiplexors which I’ve got to believe that the toolset would resolve it down into something similar to the new code (depending how much it optimizes), it doesn’t hurt to be more concise where possible. It makes it a little easier to read as well.
The new code is expressed as about a seven to one mux on inputs rather than a cascade to two to ones.
Having a wider multiplexor is probably faster than having them cascade.
I moved some of the calculations out to prior pipeline stages (it cost more hardware). So that should help with speed too.

I’ve since done a similar thing with branch target calculations in the fetch stage.

Moved onto working on bitfield ops. Made them work more like the 68k in that it is now possible to use registers to specify the offset and width. Previously the bitfield was coded as constants in the instruction. Using registers may make it easier to manipulate bitfields in arrays. I also added a find first one function.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 06, 2018 3:22 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I just noticed that the branch pc is computed in the fetch stage, and it’s also recomputed in the flow control unit in the execute stage. It looks to me like it’s being computed twice. I will have to think upon whether this is a good idea or not. If it’s computed once and passed forward via registers, then there’s a 8-to-1 multiplexor involved at the issue stage. It’s maybe smaller/faster to just recompute the value.

This is also buried in the original RiSC-16.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 06, 2018 3:53 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quickly hacked together an assembler/compiler for FT64v5. Got some stats. Coding for compressed instructions isn’t complete yet, but the most commonly compressed instructions are included below.
Code:
number of bytes: 69060.000000
number of instructions: 22200
number of compressed instructions: 12710
3.110811 bytes (24 bits) per instruction
Compression ratio: 26.905164%

The code is about 40% smaller than the v4 code, and executing fewer instructions to do the same work.
However, what was lost was the precision field of the instruction, which just wouldn’t fit into 32 bits. Meaning additional instructions that include precision will have to be 48-bit. Also there are fewer registers available meaning the compiler may have to do a better job allocating them.

_________________
Robert Finch http://www.finitron.ca


Tue Aug 07, 2018 3:57 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
CC64 relies on a garbage collector. The garbage collector runs periodically when triggered by garbage collect interrupt. One issue is how long to allow the garbage collector to run. If the garbage list has a large number of items on it, it may be better to leave items remaining on the list, and move onto the list for the next app.

I want to manipulate linked lists using the mapped addresses of an application within the OS. Why? Heaps for applications are managed by the app. But garbage collection is done by the system.

Addresses are mapped when the operating level is at the user level. So, the operating level could be switched to the user level for access. However, that includes all memory references, so the local variables of the operating system routine would be inaccessible. I want local vars and global vars to be at the machine level while the remaining heap accesses are at the user level.
It looks like I need separate operating levels for data, stack and code. If separate operating levels are provided and the data operating level is changed in a function, the only drawback that I can see is that if the address of a variable located on the stack is taken and placed in a register, then references to the variable would go to the wrong address because the reference would be made via a temporary register and not indexed by the stack or frame pointer. As long as the address of a stack variable isn’t being manipulated, I think it should be okay.

_________________
Robert Finch http://www.finitron.ca


Thu Aug 09, 2018 1:46 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(OT, but there's an excellent talk on Go's garbage collector here. Some ideas for algorithms and tactics, perhaps.)


Thu Aug 09, 2018 6:08 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 11, 12, 13, 14, 15, 16, 17 ... 52  Next

Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software