View unanswered posts | View active topics It is currently Fri Apr 26, 2024 10:29 am



Reply to topic  [ 121 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 9  Next
 RTF64 processor 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
I like the old style memory, you knew just how fast your 6502 was.
A if statement allways has branch component. Can you have prefetch instruction
to fetch the new ip address in a small taged cache.
if a & 3 {
prefetch offset tag
load a
load 3
and
beq tag
...
This is valid for simple logic. if *i++=foobar() is going to be slow no matter what you do.

Hi Ben, I don not see how prefetching would help. It is not known which direction the branch will take so it is about 50/50 in one direction or the other. Prefetching in one direction would mean losing out on the other. Instructions are being fetched 32-bit at a time and ‘a’ would be residing in a register already. Branch instructions go right back to ifetch, so there's no way to hide a fetch cycle.
Code looks more like:
Code:
and $a1,$a0,#3
beqz $a1,.target

My PC, which is a recent version, (its four years old) seems to run all the time about 4.0GHz. It is supposed to be a 3.6GHz machine but it varies the clock rate according to workload and other factors. One needs good luck to figure out what it is doing. Variable clock rate machines have got to be fun to debug. I tried adding a register to control the clock rate in an FPGA cpu. By varying a pattern turning the clock on and off, but the thing never worked very reliably.

Decided to keep the results merging. With the compiler’s use of set instructions, it is useful all over the place. Not that useful for compares, but sets allow Boolean results so combinations are easier. I forgot I invented “safe” logical operators to deal with branch-and branch-or instructions. There come in handy with the set operations. The operators are ‘&&&’ to indicate a “safe” and, and “|||” to indicate a safe or operation. The safe operators are for the use of the programmer who knows when it is safe. Safe operations are ones that do not require branches. The compiler also has been modified to track the presence of pointer references and use the safe and/or operations if none are present. So, it is not necessary to use &&& and |||.
I was wondering about the utility of the bytndx and wydndx instructions borrowed from another machine language. I tried to find out where I spotted the instructions, but no luck. They do turn out to be useful in string manipulation functions when strings are processed eight bytes at a time. Used in a function like strchr().
Code:
public _strchr:
.again:
  ldo     $t0,[$a0]     ; grab eight bytes
  bytndx. $t1,$t0,$a1   ; is the char present? 
  bpl     .charFound
  bytndx. $t1,$t0,$x0   ; is a null present?
  bpl     .charNotFound
  add     $a0,$a0,#8    ; increment pointer
  jmp     .again
.charFound:
  add     $a0,$a0,$t1   ; add byte index to pointer
  rtl
.charNotFound
  mov     $a0,$x0       ; return nullptr if not found
  rtl
endpublic

A pair of compare instructions (cmp, cmpu) were added that store the result in a register rather than a compare results register. It was found that to transfer a +1,0,or -1 status to a register from the compare results took about five instructions. Too many! So many ways to do things. Instructions are inexpensive these days.

In the compiler typedefs were found to be allocating storage for the type defined by the typedef. This led to extra data bytes unnecessarily being output when header files were included. I finally got around to wondering what all the extra data output was.

_________________
Robert Finch http://www.finitron.ca


Sat Oct 03, 2020 4:03 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The bit and cmp instructions are gone from the instruction set. Bit was about 95% redundant with AND and rarely used. All the cmp operations were replaced with set operations or as they are also called compare operations which are specific. The ‘C’ standard library was compiled and included to get some stats.
Code:
Instruction Statistics
Loads:      5801 (18.935858%)
Stores:     5187 (16.931614%)
  Indexed:   995 (3.247919%)
Branches:   3534 (11.535825%)
Calls:        1436 (4.687449%)
Returns:      517 (1.687612%)
Adds:         3953 (12.903542%)
Ands:          332 (1.083728%)
Ors:         1236 (4.034601%)
Xors:           24 (0.078342%)
Tsts:           68 (0.221968%)
Lshifts:     465 (1.517872%)
shifts:        573 (1.870410%)
Luis:         1230 (4.015016%)
Moves:       2661 (8.686143%)
CMoves:         63 (0.205647%)
Sets:         1770 (5.777705%)
  Mops:      106 (0.346009%)
Ptrdif:       67 (0.218704%)
Bitfield:     45 (0.146891%)
Csr:           198 (0.646320%)
Floatops:    330 (1.077199%)
others:     1456 (4.752734%)
Total:     30635

number of bytes: 122540.000000
number of instructions: 30635

Conditional move was added to the instruction set. The compiler already expected it to be present for the hook operator.
The compiler was able to combine set conditions merging the result bit 106 times. This is probably often enough.

_________________
Robert Finch http://www.finitron.ca


Sun Oct 04, 2020 3:34 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
More work on the compiler today. It insisted on outputting a signed load operation when it should have been unsigned. There was also an issue in type-casting causing some type casts to fail.

A branch on equal immediate instruction was added to the instruction set. It gets used often enough to be worth adding in the author’s opinion. It is currently the only branch that uses relative addressing due to encoding limitations.

_________________
Robert Finch http://www.finitron.ca


Mon Oct 05, 2020 4:13 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I worked my way up to version ‘f’ of RTF64 in the docs then decided to start a major new revision called version 2 (v2).
v2 uses a variable length instruction set. This gives a code density savings of about 17% over a flat 32-bit instruction set. Not too bad without compressed instructions. So, I decided to go ahead and experiment with compressed instructions. The assembler figures out a list of the best instructions to compress based on a simple static analysis. The top 256 instruction are then provided in a table that can be loaded at run-time. A single two-byte instruction is dedicated to the top 256 compressed instructions. $70xx. A table internal to the processor is updated to contain the expanded version of the compressed instructions. Then whenever the processor sees a $70xx instruction it expands it to the full instruction. Using this compression mechanism compression of 26% overall was reached (for the current system). For a simple Fibonacci test program, compression of 46% was achieved. The entire program with the exception of one branch is converted to two-byte opcodes.
Code:
                           ; Fibonacci calculator RTF64 asm
                           ; r1 in the end will hold the Nth fibonacci number
                           .file "fibonacci.r64",4
                             code 24 bits
                              org   $FFFFFFFFFFFC0100
                           {+
                           start:
FFFFFFFFFFFC0100 09 03 00 05 52 03 7C             ldi   $t0,#ci_tbl
FFFFFFFFFFFC0107 84 74 00 00                      ldt   $a0,[$t0]     ; get number of entries in table
                           .mv2cit:
FFFFFFFFFFFC010B 8F 75 10 1A                      ldt   $a1,[$t0+$a0*4]
FFFFFFFFFFFC010F 50 94 FE                         sub.  $a0,$a0,#1
FFFFFFFFFFFC0112 7A 80 56 7C                      mvci  $x0,$a0,$a1
FFFFFFFFFFFC0116 37 D4 FF                         bne   .mv2cit
                           +}
FFFFFFFFFFFC0119 70 06                             ldi     $t2,#$FD
FFFFFFFFFFFC011B 70 05                             ldi     $t2,#$01   ; x = 1
FFFFFFFFFFFC011D 70 00                             stt     $t2,$00
                           
FFFFFFFFFFFC011F 70 02                             ldi     $t3,#32      ; calculates 16th fibonacci number (13 = D in hex) (CHANGE HERE IF YOU WANT TO CALCULATE ANOTHER NUMBER)
FFFFFFFFFFFC0121 70 03                             or     $t1,$t3,$x0   ; transfer y register to accumulator
                           ;   sub     $t3,$t3,#3   ; handles the algorithm iteration counting
                           
FFFFFFFFFFFC0123 70 04                             ldi     $t1,#2        ; a = 2
FFFFFFFFFFFC0125 70 01                             stt     $t1,$08        ; stores a
                           
FFFFFFFFFFFC0127 EA EA EA EA EA EA EA EA          align 16          ; fit the loop into 1 cache line
FFFFFFFFFFFC012F EA                         
                          loopx:              ; avoid potential conflict with loop token
FFFFFFFFFFFC0130 70 07                             add     $t1,$t1,$t2   ; a += x
FFFFFFFFFFFC0132 70 08                             ldt     $t2,$08        ; x = a
FFFFFFFFFFFC0134 70 01                             stt     $t1,$08        ; stores a
FFFFFFFFFFFC0136 70 00                             stt     $t2,$00        ; stores x
FFFFFFFFFFFC0138 70 09                             sub.  $t3,$t3,#1   ; y -= 1
FFFFFFFFFFFC013A 37 D8 FF                         bne   $cr0,loopx  ; jumps back to loop if Z bit != 0 (y's decremention isn't zero yet)
FFFFFFFFFFFC013D 70 0A                            wai
FFFFFFFFFFFC013F EA                               nop
                           
                             align 4
                           ci_tbl:
FFFFFFFFFFFC0140 0B 00 00 00 A2 00 14 00          dh_htbl

_________________
Robert Finch http://www.finitron.ca


Tue Oct 06, 2020 4:26 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The issue I am wondering about today is the use of the global pointer register to access read-only data in rom. The data segment containing system variables and the read-only data have vastly different addresses. The data segment is located in ram low in the address range, the read-only rom data is located high in the address range. It makes it a challenge then to access the data using a single data pointer. I think the solution is to use two pointers. $gp0 and $gp1. For system routines that access I/O it might be handy to have yet another global pointer $gp2. Thirty-two registers are starting to look cramped.

The compiler was modified to make better use of the global pointer register. The compiler uses $gp0 for data references and $gp1 for read-only data references.

With the latest round of instruction changes, the code is about 30% smaller than the flat 32-bit version.

The push instruction was added along with branch on bit set/clear. The way load / store instructions are encoded differently. There is now a mode bit to indicate either register indirect with displacement or scaled indexed address modes. The core looks more like the nvio core now. I seem to be converging on a favorite instruction set.

_________________
Robert Finch http://www.finitron.ca


Wed Oct 07, 2020 4:19 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the cmp instruction back in, then removed it again from the ISA. I had thought it may be possible to use it with functions like memcmp() or strcmp(), but it turns out not to be needed. It is possible that hand-coded assembly might be able to make effective use of the cmp instruction but it is difficult for the compiler to use.

Added extended left and right shift operations which include the carry bit from a previous operation in the shift. Also added byte, wyde, and tetra shifting. The compiler output some of the shift operations as sub-word operations when the C standard library was compiled. Rather than modify the library or use a complicated sequence of instructions to mask operations to the correct size, the operations were simply directly included in the core.

Did a lot of refactoring in the compiler today. Broke up a couple of methods into smaller pieces and moved a little bit of the code generation logic out to a class specific to RTF64. The hope is to eventually have all the code generation logic specific to the processor contained in a single class. That way different processors may be used by providing a code generation class for them.

I increased the number of times that the compiler can merge the compare results together. It now merges results 432 times, eliminating 432 branches from the code. This accounts for about 1.5% of instructions. According to the stats branches occur about every nine instructions. This is a relatively low frequency. This is good, but it could also mean the compiler doesn't generate efficient instruction sequences for other code.

_________________
Robert Finch http://www.finitron.ca


Thu Oct 08, 2020 3:30 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
The only other case for general branches is I/O devices and kernel busy loops.
This could also be a special module, for code optimization due to hardware or device specfic
instructions.
Ben.


Thu Oct 08, 2020 7:09 pm
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 285
Location: California
robfinch wrote:
[I] added the cmp instruction back in, then removed it again from the ISA. I had thought it may be possible to use it with functions like memcmp() or strcmp(), but it turns out not to be needed. It is possible that hand-coded assembly might be able to make effective use of the cmp instruction but it is difficult for the compiler to use.

Is this the same CMP-type instruction that the 6502 has? I've worked with the PIC16F processors a lot which do not have such an instruction, so you have to subtract, which destroys the original contents. It's a pain when you want to do a series of comparisons, as in a CASE structure. What I did in my structure macros was to make the assembler remember what the last compare was, and then add it into the number to compare for the next one, as in
Code:
        ADDLW   LOW (H'200'-compNr+PrevCompDif)

This is in the macro, so you don't have to look at the ugly internal details every time you use it. The operands are calculated at assembly time, so there's no runtime penalty. Here's an example usage:
Code:
                                                        ; When we come here the first time after selection, NEW_KEY_FLAG was
TREATMENT:                                              ; cleared in MENU_CASE_PRELUDE, and MENU_ITEM_STATE was cleared in
        MOVF    MENU_ITEM_STATE, W                      ; MENU_TASK case 0.  If TREATMENT was already selected, then NEW_KEY_FLAG
        CASE                                            ; will remain set if a new keypress was made.
            CASE_OF_  5                                 ; Case 5 is for running the treatment.
                GOTO  TREATING                          ; (CALL, RETURN)  I'm separating out this routine.  Case 5 is first, to
            END_OF_                                     ; reduce the number of instructions taken each time the task is called up
                                                        ; after a treatment is actually started.

            CASE_OF_  0                                 ; Case 0 is for when we first arrive here from the main menu.  Display
                DISP_ROM_STR  CLR,  Sel_a_Rx_STR        ; "Rx #1  (Use < > )"  (but replace < and > with the arrow characters).
                INCF  MENU_ITEM_STATE, F                ; Transition to state 1, offering the choice of Rx #1.
                CLRF  MENU_ITEM_STATE_L2
                GOTO  trm1                              ; Go about 100 lines down, to test the validity of the Rx.
            END_OF_                                     ; (The _ after the END_OF prevents wasting memory on a GOTO <END_CASE>.)


            CASE_OF_  6                                 ; Case 6 is for exiting, to time the "Exiting" message.
                MOVF  MENU_ITEM_STATE_L2, W
                IF_ZERO                                 ; This is kind of like a "CASE_OF 0", and, further down, "CASE_OF 1".
                    DISP_ROM_STR  CLR,  Exiting_STR     ; Display, "Exiting".
                    CALL  _1_SEC_DISP                   ; Put the current time plus one second in LCD_TARGET_TM.
                    INCF  MENU_ITEM_STATE_L2, F
                ELSE_
                    IF_FLAG_VAR  NEW_KEY_FLAG, IS_CLEAR
                        CALL  TM_2_LCD_TARGET_TM?       ; How long left to see "Exiting"?  Take current time minus LCD_TARGET_TM.
                        RETURN_IF_ACCb_NEG              ; If it's negative, we must still give more time to see msg, so just exit.
                    END_IF
                                                        ; If we've reached the target time, or if a key was pressed,
                    CLRF  LCD_TARGET_TM                 ; make sure nothing else could think LCD_TARGET_TM is in use,
                    CLRF  MENU_STATE                    ; set _main_ menu state to 0, (MENU_TASK will clear MENU_ITEM_STATE)
                END_IF                                  ; and proceed to clear key status and exit.

                CLR_FLAG  NEW_KEY_FLAG
                RETURN
            END_OF_                                     ; The _ at the end of END_OF_ eliminates the jump to END_CASE, since it was
        END_CASE                                        ; immediately preceded by an unconditonal RETURN, & END_CASE is next anyway.

                                                        ; We're waiting for the 1st keypress to select a Rx.
        RETURN_IF_FLAG_VAR  NEW_KEY_FLAG, IS_CLEAR      ; If no key was pressed (which is the usual situation), just exit.
        CLR_FLAG  NEW_KEY_FLAG                          ; A key has been pressed.

        MOVF    NEW_KEY, W                              ; We only watch for <--, -->, HOME, END, NO, and YES.  Other keys ignored.
        CASE
            CASE_OF_   RAW_LEFT_KEY
                DECF   MENU_ITEM_STATE, F               ; Decrement with left_arrow key, but
                BTFSC  STATUS, Z                        ; don't let it get below 1.  If it did,
                INCF   MENU_ITEM_STATE, F               ; put it back to 1.
            END_OF


            CASE_OF_   RAW_RIGHT_KEY
                MOVF   MENU_ITEM_STATE, W               ; Check MENU_ITEM_STATE
                SUBLW  4                                ; against 4 (the max allowable).
                BTFSS  STATUS, Z                        ; If it's not already there,
                INCF   MENU_ITEM_STATE, F               ; you can increment it.
            END_OF


            CASE_OF_   RAW_HOME_KEY
                PUT  1, IN, MENU_ITEM_STATE
            END_OF


            CASE_OF_  RAW_END_KEY
                PUT  4, IN, MENU_ITEM_STATE
            END_OF


            CASE_OF_  RAW_NO_KEY
                PUT  6, IN, MENU_ITEM_STATE             ; This gets trapped above, to time the "Exiting" message.
                RETURN
            END_OF_                                     ; The _ at the end of END_OF_ eliminates the jump to END_CASE, to save a
                                                        ; program-memory word since there's an unconditional RETURN right before it.

            CASE_OF_  RAW_YES_KEY
                IF_FLAG_VAR  Rx_VALID_FLAG, IS_SET      ; If not valid Rx, YES key gets ignored. For checking, the decryp-
                    CALL   INSTALL_DECRYPT_ARRAY        ; tion was already done when the left or right arrow was pressed.

                    DISP_ROM_STR  CLR,  Treating_STR    ; Display "Treating" briefly before asking for AGL.
                    CALL   _4_SEC_DISP                  ; Tell it to display for 4 seconds before asking for initial AGL.
                                                        ; (No delay loops, but it puts current time + 4 sec in LCD_TARGET_TM.)
                    CALL   XFER_HI_TM_2_ACCb            ; We'll ask for first AGL right away (but it will wait until the
                    MOVF   ACCbLO, W                    ; "Treatment begun" message has shown for a few seconds first).
                    MOVWF  NEXT_AGL_TM                  ; We'll forgo the macro usage here since we're copying the time
                    MOVWF  TREATMENT_BEG_TM             ; to two variables, and we can avoid re-loading bytes this way.

                    MOVF   ACCbHI, W
                    MOVWF  NEXT_AGL_TM+1
                    MOVWF  TREATMENT_BEG_TM+1

                    _8V_UP                              ; Make the 8V pulse output high for 5 or 10ms.  (DELAY's parameter is the
                        DELAY  1                        ; number of ms if interrupts are off, with a 2ms resolution so input is
                    _8V_DN                              ; rounded up.  Here, interrupts are on though, and I'm seeing about 7ms.)

                    CLRF   AGL_IMG_TO_STORE             ; (This will get changed 3 lines down to reflect Rx number.)
                    CALL   SET_TREATMENT_BOUNDARY       ; Store a fake AGL with Rx#=0 to mark the beginning of another treatment.
                                                        ; Only the first byte of the fake AGL is significant; rest are random.
                    CLRF   MENU_ITEM_STATE_L2
                    COPY   MENU_ITEM_STATE, TO, AGL_IMG_TO_STORE     ; Before overwriting MENU_ITEM_STATE, store Rx# for STORE_AGL.
                    PUT    5, IN, MENU_ITEM_STATE       ; This will make the rest get handled (or re-routed) above.  The "ELSE_"
                END_IF                                  ; would be that YES was pressed on an invalid Rx; but that does nothing,
                RETURN                          ; so we don't have to specify.  (With an unconditional RETURN right before the
            END_OF_                             ; END_OF, we can put the _ after END_OF to keep it from assembling GOTO END_CASE.)
            RETURN                              ; All other keys are ignored.
        END_CASE                                ; But if <--, -->, HOME, or END were pressed, we will continue here to display the
                                                ; Rx number, followed by either "Rx is invalid", "Rx is blank", or the Rx itself.
                                                ; Invalid ones won't be stored here, but a crook may try to store by different way.
 < snip >

(This is from the semi-medical realtime multitasking project I told a little bit about at viewtopic.php?p=563#p563 . And yes, that's what assembly language looks like when a macro junkie does it. :D )

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Fri Oct 09, 2020 3:20 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Is this the same CMP-type instruction that the 6502 has?
Sort of. It would set the flags the same way. But the subtract instruction could do the same thing and the result can be thrown away keeping the flags, by storing the result to x0. CMP also allowed choosing the flags register to update and merging flags results which the subtract can't do. Many of the ALU instructions can optionally set the flags register. It is optional as there are enough bits in instructions to support the option and it is handy to be able to pick and choose which instructions set the flags. Stores never set the flags register however.
Quote:
It's a pain when you want to do a series of comparisons, as in a CASE structure.
There is a branch-equal to immediate instruction in the RTF64 instruction set to handle CASE structures. The compiler makes use of it.
Quote:
make the assembler remember what the last compare was, and then add it into the number to compare for the next one, as in
Talk about jumping through hoops. It looks like a good work-around. It look like you've practically defined a higher-level-language through macros. 'C' came about from this sort of thing.

*****

More compiler refactoring. Some global variables managed to get moved into classes. There were a few hiccups to debug, but things are back to working.

Finally, the processor made it all the way through the Fibonacci test program successfully in simulation. This is while decompressing compressed instructions on the fly. The average CPI was 15 though. It takes two clock cycles to decompress an instruction and run it back through the decoder again. This is the Micky-mouse version of the processor though, non-overlapped pipeline. The instruction pointer increment must be disabled for the second run through the decoder. The code density is awesome.

Started working on a overlapped pipelined version of the processor. It may use multiple clock cycles per pipeline stage however. I would like to get the CPI down to five or less.

_________________
Robert Finch http://www.finitron.ca


Fri Oct 09, 2020 7:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The load and store formats were modified to be the same. This is to simplify the address generator and merge two constant formats into a single format. This is in preparation for more sophisticated versions of the core. It means reading the destination register to get store data. It turns out the destination register is readable by several other instructions (DEP, FDP) so there is no reason not to use it for store operations as well. I had to go back and modify the assembler and rtl source code.

_________________
Robert Finch http://www.finitron.ca


Sat Oct 10, 2020 4:11 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
More compiler refactoring, mostly in the expression parsing. The code is starting to look a little cleaner having broken up some monolithic functions. I have decided to add posit arithmetic to the compiler. Not a trivial addition. The compiler needs to be able to load posit constants and perform some basic posit arithmetic. So, I must come up with a soft-posit library. 64-bit posits have been chosen as a base to work with. But there might be a need to support transferring 32-bit or less posits to / from memory. The cpu itself is going to support posit arithmetic. The rtl for basic posit operations (add, sub, multiply) are already present in a library. If only I could remember where the library was….
To recognize a constant as a posit and not a floating-point constant the constant has a ‘p’ appended to it as is “123.45p”. This is like the compiler’s interpretation of double or quad constants which have a ‘d’ or ‘q’ appended.


The compiler has been modified to allow bit selects from scalar types by using the index operator[] on a variable. One can do the following extract of bits 2 to 5 of the variable ab add to it and return the value.
Code:
int main(int a, int b)
{
  int ab;
  return (ab[5:2]+20);
}

It is possible also to select just a single bit as in ab[4]. It should allow writing code a little more efficiently. It is sometimes desirable to get at the high order bit of a var. It’s easy with the index operator, otherwise a large constant has to be built and used as a mask. The compiler will compile the select to a bit extract EXT operation. The field selection does not have to be a constant value, any expression will be accepted as in ab[x:y] to select bits between y and x. The compiler currently only does extract this way. It does not do deposits yet. The compiler does fully support bitfields, however.

_________________
Robert Finch http://www.finitron.ca


Mon Oct 12, 2020 3:00 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quite a busy day with the compiler. Many, many bug fixes. It almost made me cry. External functions that were not prototyped were treated as integer constants. Structs less than a word in size were being allocated on the stack and calling the memory copy routine. Bitfield offsets and sizes did not work correctly in all cases.

Worked on exception handling today. I had a chunk of it coded then I realized it was not very memory efficient. Every try/catch/throw statement manipulated the exception handler address and registration record for the handler. This was half a dozen instruction every instance. I changed it to just invoke the break handler via the BRK instruction which only takes two bytes. The break handler then takes care of jumping back to the exception handler. This cost some runtime performance, but it is only in exceptional conditions. One of my favorite links:
https://www.codeproject.com/articles/21 ... n-handling
__xhandler_head is a global variable locating the head of the list of exception handlers. For MSVC offset zero of the FS segment is used as the storage location for the same purpose.

The compiler now supports substituting an incoming parameter with the frame pointer reference in functions with inline assembler code. One can code using the argument names:
Code:
 void asmtest(int ara, int bargggy, int cflag)
{
  __asm {
    ldo   $x1,ara
    ldo   $x2,bargggy
    ldo   $x3,cflag
    add   $x4,$x1,$x2
  }
}

In the output the name references will be offsets from the frame pointer.

_________________
Robert Finch http://www.finitron.ca


Wed Oct 14, 2020 5:34 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
More compiler improvements. It was found that bitfield references were not being optimized at all. This occurred because they use special fields called bit_offset and bit_width to store those expressions as opposed to the regular field. This issue resulted in many lines of output code being generated to handle constant values for the offset and width.

Register variables were being incorrectly initialized from stack locations at the start of a function. This was just a performance / code size issue. The values would later be overwritten by proper code. The register variables should not have been initialized in that manner.

The sbrk() function did not get generated correctly. Only about ¼ of the function had code generated. So, I moved the function to the start of the file to make it easier to debug, and pow, the entire function is generated. Debugging is not going to be so easy.

_________________
Robert Finch http://www.finitron.ca


Thu Oct 15, 2020 2:45 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
More work on the compiler. Mainly breaking down the dereferencing function into more smaller functions. Never did figure out why the sbrk() function did not compile correctly, but it is now. The cpu now has PUSH, POP, LINK, and UNLINK instructions.

Worked a bit on the memory management software. The page table entry is currently a 20-byte (160 bit) structure. It includes virtually everything that might be needed. Actual number of bits used in the structure is somewhere around 140, but a 160-bit structure works better. To get an even number of pages and to get a decent number of PTE records five 4kB pages are allocated to store 1024 PTE records. This could be called a cluster. Each depth of cluster then absorbs 10 bits of the address space. PTEs reference groups of five 4kB pages except at the lowest level for which a single 4kB page is represented. It works out that to cover an entire 64-bit address space the top level has needs only four entries. These four entries are stored directly in a table of root pointers for address spaces. The table depth required for an app is recorded at the root level. The entire 64-bit address space need not be covered. Table depth may vary from six to one.

_________________
Robert Finch http://www.finitron.ca


Fri Oct 16, 2020 3:24 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1783
oh, five is an interesting number. Do you at any point need to divide by 5?


Fri Oct 16, 2020 8:46 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 121 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 9  Next

Who is online

Users browsing this forum: AhrefsBot and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software