Last visit was: Sat Sep 07, 2024 12:07 pm
|
It is currently Sat Sep 07, 2024 12:07 pm
|
Author |
Message |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Power working session on the compiler again, but I got it working better. Accessing variables in a more outer scope was not working properly. I think it works now, but it is a really bad idea for performance as the variable is dereferenced multiple times, once for each level into the stack the variable is distanced from the current. The compiler might assign it to a register though. Code: _main__sub1__sub2HAAA: sub sp,sp,64 sth fp,[sp] mov fp,sp sub sp,sp,96 bsr lr2,store_s0s1 ldh s0,80[fp] ldh s1,64[fp] # c = c + g + i; ldh t2,[fp] ldh t2,-16[t2] add t1,s1,t2 ldh t2,[fp] ; i is derefenced here ldh t2,[t2] ; and here ldh t2,-16[t2] ; finally accessed here add s1,t1,t2 # d = d + h; ldh t1,[fp] ldh t1,-32[t1] add s0,s0,t1 # return (c*d); mul a0,s1,s0 .00058: bsr lr2,load_s0s1 mov sp,fp ldh fp,[sp] rtd sp,sp,80
_________________Robert Finch http://www.finitron.ca
|
Fri Apr 14, 2023 4:34 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Added generic switches to the compiler. These work like the _Generic keyword in C. Same idea, but the implementation is a little different. It is a switch based on a type and type cases that must be resolved at compile time. A generic switch acts like a cast expression. Generic switches do not require adding more keywords. Code: This_var = switch(my_value) { case int: do_this_expression; case float: do_this_float_expression; default: do_this_expression_here; } OR +switch(my_value) { case int: do_this_expression; case float: do_this_float_expression; default: do_this_expression_here; }
If no default is specified, and none of the types match, the default action is to return an int value of zero. Note that type matching is exact, not relaxed. Note that the syntax is simpler than a regular switch. Block statements are not supported, and the keyword ‘break’ is not required. More complex code can be executed via functions which can be part of expressions. The typenum() keyword can also be used with an ordinary switch statement to achieve much the same result. Code: int main() { int i; long double qf;
i = 47; +switch(158.0Q) { case int: i=printf("hello world"); case long: printf("long"); case float: printf("float"); case double: printf("double"); case long double: qf=167.25Q; default: printf("default"); } }
_________________Robert Finch http://www.finitron.ca
|
Sat Apr 15, 2023 3:02 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Decided to support the _Generic() keyword. Code: int foo() { int i; long double qf;
i = 47; printf("%0d", _Generic(15.5D, int: 0, long: 1, float: 2, double: 3, long double: 4, default: printf("default") ) ); return (i); }
_________________Robert Finch http://www.finitron.ca
|
Sun Apr 16, 2023 5:33 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
When FPP was first written about 30 years ago memory space was limited. Consequently, buffers in FPP were quite small. It was limited to 10 arguments for macros. Well, when compiling the standard C library, a macro with more than 10 arguments was encountered causing FPP to error out. A simple fix was to increase the number of allowed arguments to 100, that seems to have worked. I have begun customizing the standard C library for Thor and the cc64 compiler. Found a number of compiler output errors by inspection. But code generation is looking pretty good. If only I could get the hardware to work. Code: /* memset function */ #include <string.h>
// Consists of an inner loop and outer loop. The outer loop sets one byte at a // time. When the address is aligned and there are more than 16 bytes to set, // the inner loop is triggered which sets 16 bytes at a time.
void *(memset)(void *s, integer c, size_t n) begin /* store c throughout unsigned char s[n] */ const unsigned byte uc = c; unsigned byte *su = (unsigned byte *)s; unsigned long m;
// Source all bytes of m from byte zero, broadcast m = __bmap(c,0); for (; n > 0; ++su, --n) begin if ((su & 0xf)==0) begin for (; n >= 16; su += 16, n -= 16) *(unsigned long *)su = m; // Backup by one because the outer for will increment these. --su,++n; end *su = uc; end return (s); end
Code: .sdreg 61 #==================================================== # Basic Block 0 #==================================================== _memsetQAAA: sub sp,sp,64 sth fp,[sp] mov fp,sp sub sp,sp,160 bsr lr2,store_s0s5 ldh s0,96[fp] ldh s1,-17[fp] ldh s2,64[fp] ldh s3,-33[fp] ldh s4,80[fp] # const unsigned byte uc = c; mov s5,s4 mov s1,s2 # m = __bmap(c,0); bmap t0,s4,r0 mov s3,t0 # for (; n > 0; ++su, --n) begin ble s0,0,.00020 .00019: # if ((su & 0xf)==0) begin and t0,s1,15 bnez t0,.00022 # for (; n >= 16; su += 16, n -= 16) blt s0,16,.00025 .00024: # *(unsigned long *)su = m; sth s3,[s1] .00026: add s1,s1,16 sub s0,s0,16 bge s0,16,.00024 .00025: # --su,++n; sub s1,s1,1 add s0,s0,1 .00022: # *su = uc; stb s5,[s1] add s1,s1,1 sub s0,s0,1 bgt s0,0,.00019 .00020: # return (s); mov a0,s2 .00018: bsr lr2,load_s0s5 mov sp,fp ldh fp,[sp] rtd sp,sp,80 .type _memsetQAAA,@function .size _memsetQAAA,$-_memsetQAAA
_________________Robert Finch http://www.finitron.ca
|
Mon Apr 17, 2023 4:09 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
The compiler was inserting typedefs into the global symbol table instead of the tag table. This meant the compiler could not find typedef’d types properly.
Function prototypes were showing up a zero length functions, which caused them to be inlined, and then there was no inline code emitted to call the function.
The compiler localizes local function names by prepending the name with all the names of the higher level functions. The routine foo() which is local in main() gets called main_foo() by the compiler. I am thinking this may not be the best approach. If one knows the convention then it is possible to call local routines non-locally. It may be better to give local function names a name based on a hash. It would make it less likely they would be called non-locally. With name mangling turned on, the resulting name can be quite long if its several levels deep.
Currently the compiler is referring to float variables indirectly by storing a pointer in a register then referencing the pointed to value, rather than simply storing the value directly in a register. I have not figured this issue out yet. The code should work, but it affects performance. Floats and scalars are handled the same way, and it works for scalars, so it must be close.
There is another issue with the compiler generating multiple copies of switch cases. This leads to a later issue in the compiler with the number of cases processed. One small switch generated 4,000 LOC.
_________________Robert Finch http://www.finitron.ca
|
Wed Apr 19, 2023 3:23 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Lots of work on the compiler. It was necessary to have the compiler re-arrange output using temporary files. Data tables which were encountered first in the source file needed to be placed after the code. The issue was the placement of local labels in data tables. To get the assembler to recognize the labels as local all labels had to be made local. The alternative was to use non-local labels for everything. The one exception to local labels is the name of the function itself. It needs to be non-local so it may be called externally. Hence the function label had to be placed first in the output file, meaning the code is first.
_________________Robert Finch http://www.finitron.ca
|
Fri Apr 21, 2023 5:36 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Got the compiler to compile the entire Standard C Library.
Adding support for vectors to the compiler. The vectors are specific to Thor. A vector type is a 64 byte bucket that may contain a small array of other primitive types. One challenge is the stack alignment. The stack potentially could be 64-byte aligned but that would waste a lot of space. Not having the stack 64-byte aligned makes it more difficult to handle register spills and reloads. One alternative is to have the processor able to load unaligned vectors.
Ran into issues passing arguments on the stack. A vector is equivalent to four long-integer words.
To support vector masking special global variables were added to the compiler, one for each of eight mask registers. The variables are treated like any other integer variables, except that they may be specified to govern over an expression. Syntax is vmn(<expr>). So, vm0(a+b) applies vector mask 0 to restrict which elements of the vector are processed. Establishing masks is easy. vm0 = 0x3f; would set the low order six bits of the mask.
To declare a vector variable the keyword ‘vector’ is used. So, “vector double dbl;” declares a vector variable containing doubles for elements. The number of vector elements is calculated automatically based on the size of the type specified.
_________________Robert Finch http://www.finitron.ca
|
Sun Apr 23, 2023 6:43 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Got rid of the global vector mask variables. This was a bad late-night idea. It is better to allow the compiler to assign a register to use. It turns out I had already coded things this way, but did not find the code when reviewing. A variable can be declared as a vector_mask variable and used in the same way as the global mask variable were, this is slightly more flexible.
_________________Robert Finch http://www.finitron.ca
|
Mon Apr 24, 2023 3:57 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Worked on the interrupt keyword. It accepts a parameter indicating which registers to save or load on entry and exit. The compiler does its best to use group register load and stores where possible. Group loads and stores are up to four times as fast as loading or storing individual registers, and occupy ¼ of the memory. Code for a simple interrupt routine and compiler output: Code: integer tick;
interrupt(0x7FFFFFFFFFFFFFFFL) foo() begin tick = tick + 1; end
Code: .sdreg 61 .sd2reg 60 #==================================================== # Basic Block 0 #==================================================== _foo: sub sp,sp,1008 storeg g0,0[sp] storeg g1,64[sp] storeg g2,128[sp] storeg g3,192[sp] storeg g4,256[sp] storeg g5,320[sp] storeg g6,384[sp] storeg g7,448[sp] storeg g8,512[sp] storeg g9,576[sp] storeg g10,640[sp] storeg g11,704[sp] storeg g12,768[sp] storeg g13,832[sp] storeg g14,896[sp] store gp1,960[sp] store gp,976[sp] store fp,992[sp] sub sp,sp,64 store fp,[sp] mov fp,sp mov lr1,32[fp] sub sp,sp,64 lea gp,_bss_start load t1,_tick[gp] add t0,t1,1 store t0,_tick[gp] .00010: mov sp,fp load fp,[sp] loadg g0,0[sp] loadg g1,64[sp] loadg g2,128[sp] loadg g3,192[sp] loadg g4,256[sp] loadg g5,320[sp] loadg g6,384[sp] loadg g7,448[sp] loadg g8,512[sp] loadg g9,576[sp] loadg g10,640[sp] loadg g11,704[sp] loadg g12,768[sp] loadg g13,832[sp] loadg g14,896[sp] load gp1,960[sp] load gp,976[sp] load fp,992[sp] add sp,sp,1008 rti .type _foo,@function .size _foo,$-_foo
_________________Robert Finch http://www.finitron.ca
|
Tue Apr 25, 2023 3:59 am |
|
|
oldben
Joined: Mon Oct 07, 2019 2:41 am Posts: 627
|
Looking at all that code, are hardware interupts really needed as a low level construct? Have all I/O done by I/O processers, and just have hardware message passing for the IRQ software service. Any real IRQ's would just start and stop processes,in real time, like read a mouse every 1/ 10 a second, or sleep until next video frame. Ben.
|
Tue Apr 25, 2023 5:19 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
That is code to save and restore all 64 registers. For most IRQ routines probably saving one or two groups of registers would be enough, especially if written in assembler. If written in a high level language pretty much all the registers need to be saved and restored, because who knows what register the compiler might choose to use.
The code kind of looks like a butterfly turned sideways.
I/O processors are a great idea. I just have to get around to them.
Have you looked at the latest open PowerPC processor?
_________________Robert Finch http://www.finitron.ca
|
Sat Apr 29, 2023 9:03 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Output generated by the compiler looks better all the time. There are fewer crashes and better recognition of valid programs.
I recently copied the test suite, about 200 files to Github. Most of the files will at least compile with no errors. Although generated code remains to be tested.
The assembler code files are simply called .asm files. To be mixed up with .asm files of other architectures. I have been wondering how to compare output for different architectures in the color syntax hi-lighting editor.
_________________Robert Finch http://www.finitron.ca
|
Sun Apr 30, 2023 2:41 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Considering adding a section block declaration to the language. It would allow variables to be assigned to specific sections also identified in the linker script. This may be useful for non-cacheable variables and I/O related variables. “C” has a way of specifying the section a variable is in through the __attribute__ mechanism. A section declaration would work like: section <section name> { Variable declarations… }
Example: section bss { integer my_var; }
The compiler currently assigns sections automatically. It places variables without initializers in the bss section and constants in the rodata section. One might want a constant in the text section though or a variable in the data section instead of bss. Or one might want to place large arrays in a non-cached data section. The section declaration would be able to override the compiler’s default assignments.
_________________Robert Finch http://www.finitron.ca
|
Mon Jul 10, 2023 9:30 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
Rebranded the cc64 compiler as 'arpl' and gave it its own repository in Github.
_________________Robert Finch http://www.finitron.ca
|
Fri Jan 26, 2024 3:42 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2153 Location: Canada
|
In the process of updating the compiler, found out that references to stack arguments were not calculated correctly when using the ENTER instruction. The issue was that a while ago the compiler was switched to using 128-bit integers and the argument offset was calculated using only 64-bit integers. So, the argument offset calculation was updated.
_________________Robert Finch http://www.finitron.ca
|
Sat Jan 27, 2024 5:51 am |
|
Who is online |
Users browsing this forum: CCBot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|