View unanswered posts | View active topics It is currently Tue Mar 19, 2024 2:03 pm



Reply to topic  [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
 CC64 / ARPL Compiler 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Interesting... in my mental model, if we did stick with the r1-r4 as they are, it doesn't mean a subroutine can't use any other registers for whatever purpose, it just means that the subroutine needs to preserve them for its return.

I suppose we're looking for the minimum amount of stack activity, over all codebases, measured dynamically - because that's how this decision affects performance.

It becomes clear than 32 is much larger than 16!


Tue Aug 01, 2017 6:20 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
An optimization of binary operands was added. If both the operands to a binary operation (eg. add) are the same, then the operand is only loaded once into a register and used. This saves a extra temporary register, and possibly instructions.

Example (with optimization):
Code:
   code
_BinaryMergeTest:
            sub     r14,r0,1
            sto     r12,r14,0
            mov     r12,r14,0
            sub     r14,r0,0
   #    return a + a;
            ld      r1,r12,1
            add     r1,r1,0
BinaryMergeTest_4:
            mov     r14,r12,0
            ld      r12,r14,0
            add     r14,r0,1
            mov     r15,r13,0
   rodata
   extern   _BinaryMergeTest


Without optimization:
Code:
   code
_BinaryMergeTest:
            sub     r14,r0,1
            sto     r12,r14,0
            mov     r12,r14,0
            sub     r14,r0,0
   #    return a + a;
            ld      r1,r12,1
            ld      r2,r12,1
            add     r1,r2,0
BinaryMergeTest_4:
            mov     r14,r12,0
            ld      r12,r14,0
            add     r14,r0,1
            mov     r15,r13,0
   rodata
   extern   _BinaryMergeTest


_________________
Robert Finch http://www.finitron.ca


Wed Aug 02, 2017 3:08 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Adding support for 32 bit integer type to OPC6 version of the compiler.
The compiler routines it calls to do the math have an extra underscore in them now to avoid name conflicts.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 07, 2017 2:50 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I got to wondering about why register vars weren't being allocated in a very good fashion. It turns out the table used was sorted from worst to best choices instead of from best to worst. I finally dumped the table to the debug file to see what was going on. The analyzer now detects when a loop is active and gives more weight to variables inside of loops. The analyzer also now considers constants to be placed in a register if they are used enough times.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 07, 2017 8:51 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Support for a number of triple input instructions was added for the latest project. Compiled code for the triple input logical and operation (calls sand for set and) is shown below. There is also a triple input bitwise and. There are three read ports available in the instruction set which allows more operations including indexed stores. In the compiler optimization phase it looks for and's and or's that are chained together and converts them to triple input operations where possible.

Code:
;    x = a && b && c;
            lw      $t1,12[$fp]
            lw      $t2,16[$fp]
            lw      $t3,20[$fp]
            sand    $t0,$t1,$t2,$t3
            sw      $t0,-4[$fp]


The author modified the compiler to output $ in front of register names to distinguish them from other identifiers. The '%' and '$' are used fairly commonly by other compilers for this purpose.

An example of C code that uses multiple bitwise or's to build a constant is shown below.
Code:
   emit_insn(((disp >> 3) & 0x3FF) << 22 |
      (Rc << 16) |
      (Rb << 11) |
      (Ra << 6) |
      ((disp >> 2) & 1) |
      opcode6
   );

This should end up being just a couple of three-input or's.

_________________
Robert Finch http://www.finitron.ca


Mon Sep 04, 2017 4:17 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
A while ago parameters passed in registers were added to the compiler. Code was added to save register parameters used by a called routine before calling a routine. The code has been improved to only saving the register parameters if the current routine also uses them. If the current routine doesn’t use any register parameters there’s no reason to save them off.

A case was just found where the order of parameters passed interfered with setting up registers to be passed to a subroutine because nested routines taking register parameters were in use. In the code below r19 is a value passed into the current routine as a register parameter. r19 is also a register argument for the next lower subroutine. The problem is the value of r19 needs to be placed into r18 *before* it is overwritten with r3. There is something wrong with the order parameters are setup.
Code:
push     $r18
push     $r19
ldi   $r3,#$20
mov   $r19,$r3
mov   $r18,$r19

The problem is that the register parameters for the caller need to be saved in temporaries before the values are used to setup registers for a called routine. Code to do this looks something like:
Code:
push $r18   ; save register params
push $r19
mov  $r3,$r18  ; assign register params to temps
mov $r4,$r19
ldi    $r5,#$20
mov $r19,$r5   ; assign register params from temps
mov $r18,$r4
call <subroutine>

The issue with this code is that it’s actually probably faster to not pass parameters in registers. Since the temporaries must be pushed onto the stack anyway, just passing parameters on the stack doesn’t take up any more clock cycles or code space. In fact it may consume fewer clock cycles and code space.
Code:
ldi $r3,#$30
push $r3
push $r19
call <subroutine>

The above is what the code looks like if parameters are pushed onto the stack. It looks simpler, but the drawback is that in the called routine the parameters have to be loaded off the stack for use.

A smart compiler would be able to figure out what to do with the register parameters. But it’s desired to stick to a compiler that’s somewhat simpler. Register parameters exist because they can offer a performance improvement in some circumstances.

So the simple solution is not to use register parameters in cases where there could be a nested routine also using register parameters. Using nested routines with registered parameters probably will not compile the correct code for this compiler. To be fixed at some point.

_________________
Robert Finch http://www.finitron.ca


Sat Dec 30, 2017 1:42 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The compiler emits a useless operand size conversion operation ‘bfextu’ in the code below. A half-word is being stored to memory so the compiler outputs code to convert a word value to a half-word. But it isn’t actually needed as the store operation will already store only the half-word. Getting rid of this extra operation turns out to be not simple and easy do.
The bitfield extract operation is used to convert between different operand sizes because it requires only a single instruction word compared to a bitwise ‘and’ which might require multiple word to represent the masking constant. Bitfield extract also works with oddball sized operands that might come from bitfields in a structure.
Code:
;    pAVIC[492] = pAVIC[492] | (1 << spriteno);
            lh      r2,1968[r11]
            ldi     r4,#1
            lw      r5,16[bp]
            asl.h   r3,r4,r5
            or      r1,r2,r3
            bfextu   r1,r1,#0,#31
            sh      r1,1968[r11]

I’ve been trying to get the compiler to do a lifetime analysis of register r1 and remove the bitfield extract that isn’t needed. The order of the store operation and the bitfield extract can be switched around. That would possibly make the extract the last update to r1. Since r1 isn’t used subsequently then the instruction could be removed from the instruction stream.

I've also been busy porting the compiler to the 68000.

_________________
Robert Finch http://www.finitron.ca


Thu Jan 04, 2018 5:54 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(just an idea: call it cc64 instead, because c64 is a widely used abbreviation for the Commodore 64 computer, and that throws me every time!)


Thu Jan 04, 2018 8:33 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
(just an idea: call it cc64 instead, because c64 is a widely used abbreviation for the Commodore 64 computer, and that throws me every time!)

Ed, that's a good idea. I shall endeavor to change the project over to cc64.

The compiler now removes the extra 'bfext' in some circumstances. It searches backwards through the generated code for instructions targeting registers where the register isn't subsequently used and removes those instructions sometimes. The problem is if there is a backwards loop in play the compiler doesn't follow all the loops to see if the register is used, it just assumes the register is used then.

_________________
Robert Finch http://www.finitron.ca


Thu Jan 04, 2018 11:16 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Changing the name of the compiler to CC64 to avoid confusion.

_________________
Robert Finch http://www.finitron.ca


Thu Jan 04, 2018 11:18 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I'm changing all the tools now to two letters instead of just a single letter. I guess I'm getting more sophisticated.
So AS64 instead of A64, LN64 instead of L64 and EM64 instead of E64. For the assembler, linker, and emulator respectively.

Trying to figure out now how to change the directory structure without having to delete and readd all the files to the project.

_________________
Robert Finch http://www.finitron.ca


Fri Jan 05, 2018 7:23 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added new and delete operators to the CC64 language which simply make run-time calls to __new() and __delete(). Adding new and delete operators caused the size of the return block to be increased by a word so a pointer to the function’s object list could be included. That means my enormous code base had to be re-compiled and several assembly language routines need to have the argument offsets changed.

Added an option to specify strides with the CC64’s enum keyword. The stride is the amount that the enum increments by. So
Code:
enum(-1) {
   OKAY = 0,
   BadError
}

will assign BadError = -1.

I needed the error constants returned by a routine to be negative.
This can be useful for bit-field constants too where the value needs to be shifted over a number of bits.
Leaving out the stride option assumes a stride of one.

_________________
Robert Finch http://www.finitron.ca


Sat Feb 17, 2018 11:19 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
An option allowing the use of short (32-bit) pointers was added to the compiler. Many apps don’t need an address space over 4GB and storage of long pointers wastes memory. When used with an MMU the entire address range of an app can be less than 4GB. Short pointers can be enabled using the phrase “using short pointers;” in code.

_________________
Robert Finch http://www.finitron.ca


Tue Feb 20, 2018 12:47 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Forcefit() is a function used to coerce types from one to another by the compiler, sometimes by outputting type conversion code. It’s 250 lines of code. It hasn’t undergone that much debugging because usually the types in use match. The basic idea behind forcefit() is simple. Fit the operands to the largest operand type with a few exceptions.

Reduced the amount of branching the compiler generates by using the reduction ‘or’ instruction. The compiler used to implement logical (not bitwise) ‘and’ and ‘or’ operations by loading a one or zero into a register using branches after testing for zero / non-zero. The reduction ‘or’ converts a value to a Boolean 1 or 0 making it possible to use regular bitwise ‘and’ and ‘or’ operations without testing values for zero / non-zero.

Added a peephole optimization to turn a bitwise operation followed by a complement into a single complementing instruction.

Currently trying to get the compiler to use a floating-point size other than a double. The type was hard-coded as a double in several places. Fixing the compiler up so that it recognizes several different float formats is not so simple. For instance if temporaries for floating point are in use, what should be saved and restored ? Should the compiler assume the worst, that 128 bit floats are in use and provide associated loads and stores ? It sounds like a max precision option specified to the compiler would be helpful. If only float singles are in use there could be a lot of wasted space if the compiler assumes it needs 128 bit loads and stores.

_________________
Robert Finch http://www.finitron.ca


Sat Aug 25, 2018 2:47 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Some more code was added to the color graphing aspect of the compiler. While incomplete yet, the author continues to gain a greater understanding of the register allocation method.
I’m thinking about modifying the compiler to differentiate between register and immediate mode instructions. The problem is that the instructions should be different for the optimizer. There is no interference graph conflicts with an immediate value, but there is with registers. Right now the compiler outputs the same mnemonic for either register or immediate values and leaves it up to the assembler to sort out. Using the same mnemonic means the optimizer can’t be as effective as if they were distinct.
Currently the compiler coalesces the variable ranges using a brute force approach which takes n^2 or worse time. Work is being done to convert this to using an interference graph.

_________________
Robert Finch http://www.finitron.ca


Sun Aug 26, 2018 3:25 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software