View unanswered posts | View active topics It is currently Thu Mar 28, 2024 9:05 am



Reply to topic  [ 82 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next
 CC64 / ARPL Compiler 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
C64 compiler is 'C' derivative that should be able to compile almost any C program. It has a few extra features not found in standard C.
It's evolved for a few years now for supporting different processing cores, most recently for the FT64 core.
The most recent improvement was the compiler now realizes when it can use r1 and r2 as temporaries in a leaf function. This compresses a number of smaller routines which use r1 to return values.

Example abs() function which has just a single line of code (r18 is the I parameter):
Code:
_abs:
;    return ((i < 0) ? -i : i);
            bge     r18,r0,stdlib_3
            neg     r1,r18
            bra     stdlib_4
stdlib_3:
            mov     r1,r18
stdlib_4:
stdlib_5:
            ret     #8

_________________
Robert Finch http://www.finitron.ca


Last edited by robfinch on Fri Jan 26, 2024 3:43 pm, edited 2 times in total.



Thu Jul 20, 2017 7:25 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Updated FPP to search the ‘FPPINC’ directory for files before the normal ‘INCLUDE’ directory. The problem was unintentionally reading MS include files when building the system software. A private include directory path was required. FPP is the pre-processor for the compiler which handles all the '#' directives.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 21, 2017 3:17 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Hi Rob
I take it that's this code
https://github.com/robfinch/Cores/tree/ ... ftware/C64
but it looks like you have several versions of both C64 and FPP in your repo - one for each core? Looks like C64 is in Visual C++ and FPP is in C. I suppose A64 is the assembler for each core - is that needed by your C compiler?

And I see E64 in some cases, but I'm not sure what it might be.

Interestingly, we've just added a preprocessor to our OPC series, and we've used filepp which is a simple preprocessor written in perl. Since picking that, we've noticed gpp which is a rather general preprocessor capable of acting somewhat like cpp but with many different optional behaviours.


Fri Jul 21, 2017 3:33 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
but it looks like you have several versions of both C64 and FPP in your repo - one for each core? Looks like C64 is in Visual C++ and FPP is in C. I suppose A64 is the assembler for each core - is that needed by your C compiler?
Yes, there is a separate version for each core. At one time I was trying to maintain a single version with support for multiple cores but it's a lot of work. The most recent set of tools is under the FT64 folder. They only work properly for FT64 however. A64 might still work with other cores. I keep the backend of the assembler in a separate file for each core. I used to do the same for the compiler but I found there were switch statements all over the place to accommodate different cores. I'm using replication in part as a means for backup of tools, borrowing an idea from nature. I don't want to destroy the existing working software. So I replicate the toolset for a new core then modify it to suit without going backwards and updating older software. But it results in a lot of duplicates and outdated software.
A64 is needed to assembler the output of the compiler.
E64 is a software emulator for FT64 (not working yet).
L64 is a simple linker (but it's not been used in a while - seriously out of date).
I studied how gcc works with it's backends for different processors but I'm not sure that it'd be any less work to develop for a new core than simply replicating an existing compiler. I'm not fond of the giant master control program idea. gcc does have other benefits.

Quote:
Interestingly, we've just added a preprocessor to our OPC series, and we've used filepp which is a simple preprocessor written in perl. Since picking that, we've noticed gpp which is a rather general preprocessor capable of acting somewhat like cpp but with many different optional behaviours.
To each his own. It's good to be able to reuse existing software.

I haven't been able to get a unix like interface working reliably on my Windows workstation. One reason I've avoided working with some toolsets.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 21, 2017 10:14 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Thanks for the detail. I don't think I'd spotted E64. I do see the problem with trying to support multiple core designs. In fact this is exactly why we've felt the need for a preprocessor, but time will tell whether we have a maintainable approach (we have three somewhat similar cores at present.)


Fri Jul 21, 2017 10:29 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The compiler has been modified to accept branch hints in 'if' statements. The 'if' expression can now optionally take a second constant expression to specify the branch prediction. So an 'if' statement with a statically predicted taken branch would look like:
Code:
if (a <10; 1)
    ... <other code>

1 is predicted taken, 0 is predicted not-taken. Leaving the second expression out results in a dynamically determined branch prediction.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 28, 2017 4:17 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
A C64 compiler for OPC5 is in the works. It is stored under the software/C64 - OPC5 folder on my GitHub account.

_________________
Robert Finch http://www.finitron.ca


Sun Jul 30, 2017 12:05 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Very exciting! And thanks of course for finding OPC5 sufficiently intriguing.


Sun Jul 30, 2017 5:55 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I just wrote a long post and it timed out when I went to post.

Compiler output for aggregate assignments (not quite working yet):
Code:
typedef struct A { int a,b; } A;
int*f1(){
  A x[2]={{1,2},{3,4}};
  return g(&x[1].a); // { dg-warning "returns address of local variable" }
}


Compiler outputs:
Code:
public code _f1:
            sub     r14,r0,4
            sto     r13,r14,0
            sto     r12,r14,2
            mov     r12,r14,0
            sub     r14,r0,12
   #   A x[2]={{1,2},{3,4}};
            lea     r5,r12,-12
            mov     r6,r5
            mov     r7,r0,3
            sto     r7,r6,0
   #   return g(&x[1].a); // { dg-warning "returns address of local variable" }
            mov     r5,r0,6
            lea     r6,r12,-12
            add     r5,r6,0
            sub     r14,r0,2
            sto     r5,r14,0
            mov     r13,r15,2
            mov     r15,r0,_g
            add     r14,r0,2
addrtmp2_4:
            mov     r14,r12,0
            ld      r13,r14,0
            ld      r12,r14,2
            add     r14,r0,4
            mov     r15,r13,0
endpublic

_________________
Robert Finch http://www.finitron.ca


Mon Jul 31, 2017 7:39 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Hi Rob
Amazing to see compiler output for OPC5!

I don't think I've fully digested the code, but I notice a things:
- our preference, or convention, is to use r1-r4 for three purposes: passing parameters in, as scratch during a routine, and returning results. With this convention, simpler subroutines would be able to avoid any stack allocation.
- I see 'lea' which we don't have - probably this is a 'mov'?
- I think the trailing zero can be omitted, and depending on the assembler it might be that you need to do that, to get the one-word form which will be smaller and maybe faster.
- If you move up to the OPC6 instruction set, some or all of your 'sub' will become 'dec' which again may be smaller and faster.
- I think possibly your stack adjustments are assuming a byte addressed stack? If so, they should be halved.


Mon Jul 31, 2017 4:05 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I've fixed a number of things since the last post.
I fixed the 'lea' to an add/mov operation.
I also fixed the stack adjustments for a word oriented machine.
For the trailing zero I was assuming the assembler might not like it if they were omitted.

It works much better now at aggregate assignments.
Code:
_f1:
            sub     r14,r0,2
            sto     r13,r14,0
            sto     r12,r14,1
            mov     r12,r14,0
            sub     r14,r0,6
   #   A x[2]={{c*8,2},{3,4*c}};
            mov     r5,r0,0
            add     r5,r12,-6
            mov     r6,r5
            ld      r7,r12,2
            sto     r7,r12,2
            ldb     r7,r12,2
            mov     r8,r0,8
            mov     r1,r7,0
            mov     r2,r8,0
            mov     r13,r15,2
            mov     r15,r0,_mul
            sto     r1,r6,0
            mov     r7,r0,2
            sto     r7,r6,1
            mov     r7,r0,3
            sto     r7,r6,3
            ld      r7,r12,2
            sto     r7,r12,2
            ldb     r7,r12,2
            mov     r8,r0,4
            mov     r1,r7,0
            mov     r2,r8,0
            mov     r13,r15,2
            mov     r15,r0,_mul
            sto     r1,r6,4
   #   return g(&x[1].a); // { dg-warning "returns address of local variable" }
            mov     r5,r0,3
            mov     r6,r0,0
            add     r6,r12,-6
            add     r5,r6,0
            sub     r14,r0,1
            sto     r5,r14,0
            mov     r13,r15,2
            mov     r15,r0,_g
            add     r14,r0,1
addrtmp2_4:
            mov     r14,r12,0
            ld      r13,r14,0
            ld      r12,r14,1
            add     r14,r0,2
            mov     r15,r13,0


Still a couple of bugs, but closer. The compiler croaked on another gcc test for a complicated assignment, but I'm not going to worry about it because the MSC compiler also croaked in the same way :)

_________________
Robert Finch http://www.finitron.ca


Tue Aug 01, 2017 4:04 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Adding a vector type to C64 (but not for OCP5). Vectors are assumed to be 512 bytes in size. (64, 64 bit elements) so they eat up memory space really fast.
A variable can be declared as an 'int vector' or 'float vector' and the compiler will use vector registers and operations with the var.

_________________
Robert Finch http://www.finitron.ca


Tue Aug 01, 2017 4:08 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The last example used stacked space for variables. This example shows the compiler doesn't allocate and link the stack if it doesn't need to. Note in order to get registers used as parameters the register keyword has to be used.
Code:
int abs(register int a)
{
   return a < 0 ? -a : a;
}

int min(register int a, register int b)
{
   return a < b ? a : b;
}

int max(register int a, register int b)
{
   return a > b ? a : b;
}

unsigned int minu(register unsigned int a, register unsigned int b)
{
   return a < b ? a : b;
}

Code:
   code
_abs:
   #    return a < 0 ? -a : a;
            cmp     r8,r0,0
            pl.mov     r15,r0,TestAbs_4
TestAbs_6:
            not     r1,r8,0
            add     r1,r0,1
            mov     r15,r0,TestAbs_5
TestAbs_4:
            mov     r1,r8
TestAbs_5:
TestAbs_7:
            mov     r15,r13,0
_min:
   #    return a < b ? a : b;
            cmp     r8,r9,0
            pl.mov     r15,r0,TestAbs_11
TestAbs_13:
            mov     r1,r8
            mov     r15,r0,TestAbs_12
TestAbs_11:
            mov     r1,r9
TestAbs_12:
TestAbs_14:
            mov     r15,r13,0
_max:
   #    return a > b ? a : b;
            cmp     r8,r9,0
            mi.mov     r15,r0,TestAbs_18
            z.mov     r15,r0,TestAbs_18
            mov     r1,r8
            mov     r15,r0,TestAbs_19
TestAbs_18:
            mov     r1,r9
TestAbs_19:
TestAbs_20:
            mov     r15,r13,0
_minu:
   #    return a < b ? a : b;
            cmp     r8,r9,0
            nc.mov     r15,r0,TestAbs_24
TestAbs_26:
            mov     r1,r8
            mov     r15,r0,TestAbs_25
TestAbs_24:
            mov     r1,r9
TestAbs_25:
TestAbs_27:
            mov     r15,r13,0

I still have to figure out how to use r1-r4 as parameters, temporaries and return values at the same time. If a register is used as a parameter it has to be flagged as not available as a temporary because the value might be needed later in the function. It's easy to code by hand in assembler but not so simple for the compiler.

_________________
Robert Finch http://www.finitron.ca


Tue Aug 01, 2017 4:57 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
robfinch wrote:
I still have to figure out how to use r1-r4 as parameters, temporaries and return values at the same time. If a register is used as a parameter it has to be flagged as not available as a temporary because the value might be needed later in the function. It's easy to code by hand in assembler but not so simple for the compiler.

Ah, of course - an interesting one - I know almost nothing of the innards of a compiler. Perhaps condensing the three uses into fewer registers is something which can be done after the function is fully captured in a suitable data structure.


Tue Aug 01, 2017 5:31 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
- our preference, or convention, is to use r1-r4 for three purposes: passing parameters in, as scratch during a routine, and returning results. With this convention, simpler subroutines would be able to avoid any stack allocation.


I think it's going to be too difficult for the compiler to manage the use of r1 to r4. If one looks at something like RISC-V they have the use of registers as return values, function arguments and temporaries as separate register ranges. For the RISC-V spec:
Quote:
x16–17 v0–1 Return values Caller
x18–25 a0–7 Function arguments Caller
x26–30 t0–4 Temporaries Caller

I don't know if gcc can handle it.
It's still possible for the compiler to avoid stack allocations by using registers.
I'd suggest using only r1, r2 as return values or temporaries (for compiled code) and using two other ranges of registers for parameters and temporaries. Right now C64 uses r8 to 10 as parameters and r5 to r7 (+r1,r2 sometimes) as temporaries. It might be desirable to leave a couple of registers unassigned, unused by the compiler (r3, r4 scratch use).

Quote:
Ah, of course - an interesting one - I know almost nothing of the innards of a compiler. Perhaps condensing the three uses into fewer registers is something which can be done after the function is fully captured in a suitable data structure.
It probably could be done, but then one wouldn't know for sure which registers the compiler chose to optimize for usage. Suppose it can't optimize the use of parameter register. Then which register is a parameter register would depend on the function called. That'd make it difficult to interface to hand written assembler code.

_________________
Robert Finch http://www.finitron.ca


Tue Aug 01, 2017 5:48 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 82 posts ]  Go to page 1, 2, 3, 4, 5, 6  Next

Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software