View unanswered posts | View active topics It is currently Sat Apr 27, 2024 9:09 am



Reply to topic  [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
 CC64 / ARPL Compiler 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Power working session on the compiler again, but I got it working better. Accessing variables in a more outer scope was not working properly. I think it works now, but it is a really bad idea for performance as the variable is dereferenced multiple times, once for each level into the stack the variable is distanced from the current. The compiler might assign it to a register though.
Code:
_main__sub1__sub2HAAA:
  sub      sp,sp,64
  sth      fp,[sp]
  mov      fp,sp
  sub      sp,sp,96
  bsr      lr2,store_s0s1
  ldh      s0,80[fp]
  ldh      s1,64[fp]
# c = c + g + i;
  ldh      t2,[fp]
  ldh      t2,-16[t2]
  add      t1,s1,t2
  ldh      t2,[fp]            ; i is derefenced here
  ldh      t2,[t2]            ; and here
  ldh      t2,-16[t2]         ; finally accessed here
  add      s1,t1,t2
# d = d + h;
  ldh      t1,[fp]
  ldh      t1,-32[t1]
  add      s0,s0,t1
# return (c*d);
  mul      a0,s1,s0
.00058:
  bsr      lr2,load_s0s1
  mov      sp,fp
  ldh      fp,[sp]
  rtd      sp,sp,80

_________________
Robert Finch http://www.finitron.ca


Fri Apr 14, 2023 4:34 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added generic switches to the compiler. These work like the _Generic keyword in C. Same idea, but the implementation is a little different. It is a switch based on a type and type cases that must be resolved at compile time. A generic switch acts like a cast expression. Generic switches do not require adding more keywords.
Code:
This_var = switch(my_value) {
case int: do_this_expression;
case float: do_this_float_expression;
default: do_this_expression_here;
}
OR
+switch(my_value) {
case int: do_this_expression;
case float: do_this_float_expression;
default: do_this_expression_here;
}

If no default is specified, and none of the types match, the default action is to return an int value of zero. Note that type matching is exact, not relaxed. Note that the syntax is simpler than a regular switch. Block statements are not supported, and the keyword ‘break’ is not required. More complex code can be executed via functions which can be part of expressions.
The typenum() keyword can also be used with an ordinary switch statement to achieve much the same result.

Code:
int
main()
{
   int i;
   long double qf;

   i = 47;
   +switch(158.0Q) {
   case int: i=printf("hello world");
   case long: printf("long");
   case float: printf("float");
   case double: printf("double");
   case long double: qf=167.25Q;
   default: printf("default");
   }
}

_________________
Robert Finch http://www.finitron.ca


Sat Apr 15, 2023 3:02 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Decided to support the _Generic() keyword.

Code:
int
foo()
{
   int i;
   long double qf;

   i = 47;
   printf("%0d",
       _Generic(15.5D,
          int: 0,
          long: 1,
          float: 2,
          double: 3,
          long double: 4,
         default: printf("default")
      )
   );
   return (i);
}

_________________
Robert Finch http://www.finitron.ca


Sun Apr 16, 2023 5:33 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
When FPP was first written about 30 years ago memory space was limited. Consequently, buffers in FPP were quite small. It was limited to 10 arguments for macros. Well, when compiling the standard C library, a macro with more than 10 arguments was encountered causing FPP to error out. A simple fix was to increase the number of allowed arguments to 100, that seems to have worked.

I have begun customizing the standard C library for Thor and the cc64 compiler. Found a number of compiler output errors by inspection. But code generation is looking pretty good. If only I could get the hardware to work.
Code:
/* memset function */
#include <string.h>

// Consists of an inner loop and outer loop. The outer loop sets one byte at a
// time. When the address is aligned and there are more than 16 bytes to set,
// the inner loop is triggered which sets 16 bytes at a time.

void *(memset)(void *s, integer c, size_t n)
begin   /* store c throughout unsigned char s[n] */
   const unsigned byte uc = c;
   unsigned byte *su = (unsigned byte *)s;
   unsigned long m;

   // Source all bytes of m from byte zero, broadcast
   m = __bmap(c,0);
   for (; n > 0; ++su, --n) begin
      if ((su & 0xf)==0) begin
         for (; n >= 16; su += 16, n -= 16)
            *(unsigned long *)su = m;
         // Backup by one because the outer for will increment these.
         --su,++n;
      end
      *su = uc;
   end      
   return (s);
end

Code:
   .sdreg   61
#====================================================
# Basic Block 0
#====================================================
_memsetQAAA:
  sub      sp,sp,64
  sth      fp,[sp]
  mov      fp,sp
  sub      sp,sp,160
  bsr      lr2,store_s0s5
  ldh      s0,96[fp]
  ldh      s1,-17[fp]
  ldh      s2,64[fp]
  ldh      s3,-33[fp]
  ldh      s4,80[fp]
# const unsigned byte uc = c;
  mov      s5,s4
  mov      s1,s2
# m = __bmap(c,0);
  bmap     t0,s4,r0
  mov      s3,t0
# for (; n > 0; ++su, --n) begin
  ble      s0,0,.00020
.00019:
# if ((su & 0xf)==0) begin
  and      t0,s1,15
  bnez     t0,.00022
# for (; n >= 16; su += 16, n -= 16)
  blt      s0,16,.00025
.00024:
# *(unsigned long *)su = m;
  sth      s3,[s1]
.00026:
  add      s1,s1,16
  sub      s0,s0,16
  bge      s0,16,.00024
.00025:
# --su,++n;
  sub      s1,s1,1
  add      s0,s0,1
.00022:
# *su = uc;
  stb      s5,[s1]
  add      s1,s1,1
  sub      s0,s0,1
  bgt      s0,0,.00019
.00020:
# return (s);
  mov      a0,s2
.00018:
  bsr      lr2,load_s0s5
  mov      sp,fp
  ldh      fp,[sp]
  rtd      sp,sp,80
   .type   _memsetQAAA,@function
   .size   _memsetQAAA,$-_memsetQAAA

_________________
Robert Finch http://www.finitron.ca


Mon Apr 17, 2023 4:09 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The compiler was inserting typedefs into the global symbol table instead of the tag table. This meant the compiler could not find typedef’d types properly.

Function prototypes were showing up a zero length functions, which caused them to be inlined, and then there was no inline code emitted to call the function.

The compiler localizes local function names by prepending the name with all the names of the higher level functions. The routine foo() which is local in main() gets called main_foo() by the compiler. I am thinking this may not be the best approach. If one knows the convention then it is possible to call local routines non-locally. It may be better to give local function names a name based on a hash. It would make it less likely they would be called non-locally. With name mangling turned on, the resulting name can be quite long if its several levels deep.

Currently the compiler is referring to float variables indirectly by storing a pointer in a register then referencing the pointed to value, rather than simply storing the value directly in a register. I have not figured this issue out yet. The code should work, but it affects performance. Floats and scalars are handled the same way, and it works for scalars, so it must be close.

There is another issue with the compiler generating multiple copies of switch cases. This leads to a later issue in the compiler with the number of cases processed. One small switch generated 4,000 LOC.

_________________
Robert Finch http://www.finitron.ca


Wed Apr 19, 2023 3:23 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Lots of work on the compiler. It was necessary to have the compiler re-arrange output using temporary files. Data tables which were encountered first in the source file needed to be placed after the code. The issue was the placement of local labels in data tables. To get the assembler to recognize the labels as local all labels had to be made local. The alternative was to use non-local labels for everything. The one exception to local labels is the name of the function itself. It needs to be non-local so it may be called externally. Hence the function label had to be placed first in the output file, meaning the code is first.

_________________
Robert Finch http://www.finitron.ca


Fri Apr 21, 2023 5:36 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got the compiler to compile the entire Standard C Library.

Adding support for vectors to the compiler. The vectors are specific to Thor. A vector type is a 64 byte bucket that may contain a small array of other primitive types. One challenge is the stack alignment. The stack potentially could be 64-byte aligned but that would waste a lot of space. Not having the stack 64-byte aligned makes it more difficult to handle register spills and reloads. One alternative is to have the processor able to load unaligned vectors.

Ran into issues passing arguments on the stack. A vector is equivalent to four long-integer words.

To support vector masking special global variables were added to the compiler, one for each of eight mask registers. The variables are treated like any other integer variables, except that they may be specified to govern over an expression. Syntax is vmn(<expr>). So, vm0(a+b) applies vector mask 0 to restrict which elements of the vector are processed. Establishing masks is easy. vm0 = 0x3f; would set the low order six bits of the mask.

To declare a vector variable the keyword ‘vector’ is used. So, “vector double dbl;” declares a vector variable containing doubles for elements. The number of vector elements is calculated automatically based on the size of the type specified.

_________________
Robert Finch http://www.finitron.ca


Sun Apr 23, 2023 6:43 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got rid of the global vector mask variables. This was a bad late-night idea. It is better to allow the compiler to assign a register to use. It turns out I had already coded things this way, but did not find the code when reviewing. A variable can be declared as a vector_mask variable and used in the same way as the global mask variable were, this is slightly more flexible.

_________________
Robert Finch http://www.finitron.ca


Mon Apr 24, 2023 3:57 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Worked on the interrupt keyword. It accepts a parameter indicating which registers to save or load on entry and exit. The compiler does its best to use group register load and stores where possible. Group loads and stores are up to four times as fast as loading or storing individual registers, and occupy ¼ of the memory.

Code for a simple interrupt routine and compiler output:
Code:
 integer tick;

interrupt(0x7FFFFFFFFFFFFFFFL)
foo()
begin
   tick = tick + 1;   
end

Code:
   .sdreg   61
   .sd2reg   60
#====================================================
# Basic Block 0
#====================================================
_foo:
  sub sp,sp,1008
  storeg g0,0[sp]
  storeg g1,64[sp]
  storeg g2,128[sp]
  storeg g3,192[sp]
  storeg g4,256[sp]
  storeg g5,320[sp]
  storeg g6,384[sp]
  storeg g7,448[sp]
  storeg g8,512[sp]
  storeg g9,576[sp]
  storeg g10,640[sp]
  storeg g11,704[sp]
  storeg g12,768[sp]
  storeg g13,832[sp]
  storeg g14,896[sp]
  store gp1,960[sp]
  store gp,976[sp]
  store fp,992[sp]
  sub sp,sp,64
  store fp,[sp]
  mov fp,sp
  mov lr1,32[fp]
  sub sp,sp,64
  lea gp,_bss_start
  load t1,_tick[gp]
  add t0,t1,1
  store t0,_tick[gp]
.00010:
  mov sp,fp
  load fp,[sp]
  loadg g0,0[sp]
  loadg g1,64[sp]
  loadg g2,128[sp]
  loadg g3,192[sp]
  loadg g4,256[sp]
  loadg g5,320[sp]
  loadg g6,384[sp]
  loadg g7,448[sp]
  loadg g8,512[sp]
  loadg g9,576[sp]
  loadg g10,640[sp]
  loadg g11,704[sp]
  loadg g12,768[sp]
  loadg g13,832[sp]
  loadg g14,896[sp]
  load gp1,960[sp]
  load gp,976[sp]
  load fp,992[sp]
  add sp,sp,1008
  rti
   .type   _foo,@function
   .size   _foo,$-_foo

_________________
Robert Finch http://www.finitron.ca


Tue Apr 25, 2023 3:59 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
Looking at all that code, are hardware interupts really needed as a low level construct?
Have all I/O done by I/O processers, and just have hardware message passing for the IRQ software service.
Any real IRQ's would just start and stop processes,in real time, like read a mouse every 1/ 10 a second, or sleep
until next video frame.
Ben.


Tue Apr 25, 2023 5:19 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
That is code to save and restore all 64 registers. For most IRQ routines probably saving one or two groups of registers would be enough, especially if written in assembler.
If written in a high level language pretty much all the registers need to be saved and restored, because who knows what register the compiler might choose to use.

The code kind of looks like a butterfly turned sideways.

I/O processors are a great idea. I just have to get around to them.

Have you looked at the latest open PowerPC processor?

_________________
Robert Finch http://www.finitron.ca


Sat Apr 29, 2023 9:03 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Output generated by the compiler looks better all the time. There are fewer crashes and better recognition of valid programs.

I recently copied the test suite, about 200 files to Github. Most of the files will at least compile with no errors. Although generated code remains to be tested.

The assembler code files are simply called .asm files. To be mixed up with .asm files of other architectures. I have been wondering how to compare output for different architectures in the color syntax hi-lighting editor.

_________________
Robert Finch http://www.finitron.ca


Sun Apr 30, 2023 2:41 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Considering adding a section block declaration to the language. It would allow variables to be assigned to specific sections also identified in the linker script. This may be useful for non-cacheable variables and I/O related variables. “C” has a way of specifying the section a variable is in through the __attribute__ mechanism.
A section declaration would work like:
section <section name>
{
Variable declarations…
}

Example:
section bss
{
integer my_var;
}

The compiler currently assigns sections automatically. It places variables without initializers in the bss section and constants in the rodata section. One might want a constant in the text section though or a variable in the data section instead of bss. Or one might want to place large arrays in a non-cached data section. The section declaration would be able to override the compiler’s default assignments.

_________________
Robert Finch http://www.finitron.ca


Mon Jul 10, 2023 9:30 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Rebranded the cc64 compiler as 'arpl' and gave it its own repository in Github.

_________________
Robert Finch http://www.finitron.ca


Fri Jan 26, 2024 3:42 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
In the process of updating the compiler, found out that references to stack arguments were not calculated correctly when using the ENTER instruction. The issue was that a while ago the compiler was switched to using 128-bit integers and the argument offset was calculated using only 64-bit integers. So, the argument offset calculation was updated.

_________________
Robert Finch http://www.finitron.ca


Sat Jan 27, 2024 5:51 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 82 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

Who is online

Users browsing this forum: No registered users and 24 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software