Last visit was: Sat Sep 07, 2024 11:36 am
It is currently Sat Sep 07, 2024 11:36 am



 [ 77 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next
 Porting the C64 compiler to target OPC5 (or OPC6) 
Author Message

Joined: Tue Feb 10, 2015 7:07 am
Posts: 52
robfinch wrote:
There's a fix for bugs 16,17 now. The fix caused a fair amount of change to the code generated in my test files. Generates smaller code with the fix to boot. I hope I didn't break something else with the fix. I've just been comparing output from one set of test files to the next generation and reviewing the changes. Usually it's only a couple of lines that change.

Well, the writing to the wrong address is gone, but something is amiss...
Code:
*go 100
@ABCD
1234
5678
3.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
*

(unless the value of Pi has actually changed recently)

I'll look into this tomorrow, as we have some friends coming round to supper soon.

Dave


Sat Aug 12, 2017 6:01 pm

Joined: Tue Feb 10, 2015 7:07 am
Posts: 52
OK, the fun continues :D

18. Result of multiply being overwritten before it is used

I don't have a small example for this, but here's a fragment from pi.c
Code:
      for (i = len; i > 0; i--)
      {
         x = 10 * pi[i]+ q * i;
         pi[i] = x % (2 * i - 1);
         q = x / (2 * i - 1);
      }

The bit that is going wrong is evaluation of the expression for x:
Code:
0190  10c7 fea8                            mov     r7,r12,-344
0192  0036                                 mov     r6,r3
0193  0476                                 add     r6,r7
0194  0766                                 ld      r6,r6
0195  1007 000a                            mov     r7,r0,10
0197  0061                                 mov     r1,r6
0198  0072                                 mov     r2,r7
0199  190d 0329                            jsr     r13,r0,__mul
019b  17c6 fff9                            ld      r6,r12,-7
019d  0061                                 mov     r1,r6                # the result of the multiply (r1) is being overwritten before it has been used
019e  0032                                 mov     r2,r3
019f  190d 0329                            jsr     r13,r0,__mul
01a1  0015                                 mov     r5,r1
01a2  0415                                 add     r5,r1
01a3  0054                                 mov     r4,r5

Dave


Sun Aug 13, 2017 9:33 am

Joined: Tue Feb 10, 2015 7:07 am
Posts: 52
One more that I noticed, not (yet) affecting the Pi program.

19. long maths seems completely broken now
Code:
void bug19(long a, long b) {
   long c = a * b;
}

Code:
_bug19:
               push    r13,r14
               push    r12,r14
               mov     r12,r14
               dec     r14,2
               ld      r5,r12,2
               ld      r6,r12,3
               ld      r7,r12,4
               ld      r3,r12,5
               mov     r1,r5               # wrong number of parameters
               mov     r2,r7
               jsr     r13,r0,__mul        # should be calling __mul32
               sto     r1,r12,-2
               sto                         # missing registers
               mov     r14,r12
               pop     r12,r14
               pop     r13,r14
               mov     r15,r13


Sun Aug 13, 2017 9:41 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2153
Location: Canada
Yes, I broke the long math.

When the compiler doesn’t have enough temporaries available it attempts to re-use an existing temporary by pushing it on the stack. Unfortunately the compiler is trying to use the same temporary register for two different purposes at the same time. It looks like I may have to re-write the temporaries section of the compiler to use pseudo registers located on the stack when it runs out of temporaries. Some expressions may work for now by adding two more registers to the available pool (r8, r9), but it means the function can’t use register arguments because the args are used as temporaries.

Working on it has been like working on a finger puzzle. I thought I had it solved a couple of times. But it’s taken me longer than anticipated.
Because the compiler is using pushes and pops there must be a strict order adhered to when it comes to accessing temporaries in compiler code. I hope to re-write this so it doesn’t matter.
It may help to break up expressions into simpler pieces. I believe the compiler reinitializes the temporary pool for each expression.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 14, 2017 1:29 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2153
Location: Canada
Here is a sample of x = 10 * pi[I] + q * I in long math. It looks like it worked using r8 and r9 as additional temporaries. However I've no doubt that a complex expression won't work properly (yet).

Code:
#   code
_main:
   # long main()
               push    r13,r14
               push    r12,r14
               mov     r12,r14
   #    long pi[333];
               sub     r14,r0,672
   #    x = 10 * pi[i] + q * i;
               ld      r7,r12,-669
               ld      r6,r12,-670
               mov     r3,r12,-666
               mov     r5,r6
               add     r5,r3
               ld      r6,r5,1
               ld      r5,r5
               mov     r1,r5
               mov     r2,r6
               push    r3,r14
               push    r4,r14
               mov     r3,r0,10
               mov     r4,r0,r0
               jsr     r13,r0,__mul32
               pop     r4,r14
               pop     r3,r14
               mov     r5,r1
               mov     r6,r2
               ld      r3,r12,-671
               ld      r7,r12,-672
               ld      r8,r12,-669
               ld      r4,r12,-670
               mov     r1,r7
               mov     r2,r3
               push    r3,r14
               push    r4,r14
               mov     r3,r4
               mov     r4,r8
               jsr     r13,r0,__mul32
               pop     r4,r14
               pop     r3,r14
               mov     r7,r1
               mov     r3,r2
               mov     r1,r5
               mov     r2,r6
               add     r1,r7
               adc     r2,r3
               mov     r5,r1
               mov     r6,r2
               sto     r5,r12,-668
               sto     r6,r12,-667
               mov     r14,r12
               pop     r12,r14
               pop     r13,r14
               mov     r15,r13


#   rodata
#   global   _main

_________________
Robert Finch http://www.finitron.ca


Mon Aug 14, 2017 1:50 am WWW

Joined: Tue Feb 10, 2015 7:07 am
Posts: 52
Hi Rob,

Just a quick note to say that with the temporaries change, the Pi Spigot test program now works corrects (at 100 digits):
Code:
*go 100
@ABCD
1234
5678
3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067

This is excellent progress.

Do keep us posted as you continue to chip away at this. I appreciate how tricky this is getting, and I'm more than happy to keep testing builds if that helps.

Dave


Mon Aug 14, 2017 12:19 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2153
Location: Canada
The compiler now makes two passes at code generation in order to determine whether it has to use temporaries located on the stack instead of in registers.
An option was added “-fno_regs” to specify not to use any registers for temporary values. It can also be passed as a function attribute to force the use of stack variables.
I’m actually using the OPC6 compiler as a reference for another compiler I’m developing. The OPC6 compiler is somewhat simpler because it only supports word size operands.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 14, 2017 7:04 pm WWW

Joined: Tue Feb 10, 2015 7:07 am
Posts: 52
Hi Rob,
robfinch wrote:
The compiler now makes two passes at code generation in order to determine whether it has to use temporaries located on the stack instead of in registers.
An option was added “-fno_regs” to specify not to use any registers for temporary values. It can also be passed as a function attribute to force the use of stack variables.
I’m actually using the OPC6 compiler as a reference for another compiler I’m developing. The OPC6 compiler is somewhat simpler because it only supports word size operands.

I've just tried the latest compiler code with the Pi Spigot. It compiles without any errors, but the generated code is missing certain labels. For example the _putchar and _main labels are not output (but the code is there). In fact, the only labels present are the pi_<number> ones.

Here is the code I'm testing with:
https://github.com/hoglet67/opc/blob/c_ ... got-c/pi.c

Dave


Mon Aug 14, 2017 7:52 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2153
Location: Canada
Okay, the labels should be back now.
I added a couple of more files to the compiler. Moved function prototypes out to a file called proto.h.

_________________
Robert Finch http://www.finitron.ca


Tue Aug 15, 2017 1:45 am WWW

Joined: Tue Feb 10, 2015 7:07 am
Posts: 52
Hi Rob,

Not much luck with the Pi program at the moment (C64 compiler commit 8c8874e).

Here's a smaller test case that also fails.

21. Math16.c issues
Code:
void init() {
   asm  {
      ORG 0x100
      mov  r15, r0, _main
   };
}

void putchar(char c) {
   asm __leafs {
      ld  r1, r14, 1
      jsr r13, r0, 0xffee
   };
}

void main() {
   int a,b,c,d,e;
   a = 234;
   b = 56;
   c = a * b;
   if (c == 13104) {
      putchar('.');
   } else {
      putchar('x');
   }
   d = a / b;
   if (d == 4) {
      putchar('.');
   } else {
      putchar('x');
   }
   e = a % b;
   if (e == 10) {
      putchar('.');
   } else {
      putchar('x');
   }
   putchar(10);
   putchar(13);
}

Code:
0000  100f 010a                mov   r15,r0,_main
0002                        #   code
0002                        _init:
0002                       
0002                                 ORG 0x100
0100  100f 010a                      mov r15, r0, _main
0102  00df                                 mov     r15,r13
0103                       
0103                       
0103                        _putchar:
0103  28ed                                 push    r13,r14
0104                       
0104  17e1 0001                      ld r1, r14, 1
0106  190d ffee                      jsr r13, r0, 0xffee
0108  29ed                                 pop     r13,r14
0109  00df                                 mov     r15,r13
010a                        _main:
010a  28ed                                 push    r13,r14
010b  28ec                                 push    r12,r14
010c  00ec                                 mov     r12,r14
010d  0e6e                                 dec     r14,6
010e  1004 00ea                            mov     r4,r0,234
0110  1005 0038                            mov     r5,r0,56
0112  16c5 fffe                            sto     r5,r12,-2
0114  0041                                 mov     r1,r4
0115  17c2 fffe                            ld      r2,r12,-2
0117  190d 01da                            jsr     r13,r0,__mul
0119  0015                                 mov     r5,r1
011a  16c5 fffd                            sto     r5,r12,-3
011c  17c5 fffd                            ld      r5,r12,-3
011e  3a05 3330                            cmp     r5,r0,13104
0120  6c6f                              nz.inc     r15,math16_30-PC
0121  1005 002e                            mov     r5,r0,46
0123  28e5                                 push    r5,r14
0124  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
0125  0c1e                                 inc     r14,1
0126  0c5f                                 inc     r15,math16_31-PC
0127                        math16_30:
0127  1005 0078                            mov     r5,r0,120
0129  28e5                                 push    r5,r14
012a  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
012b  0c1e                                 inc     r14,1
012c                        math16_31:
012c  0041                                 mov     r1,r4
012d  10c2 fffe                            mov     r2,r12,-2     # should be ld r2,r12,-2
012f  190d 01f2                            jsr     r13,r0,__div
0131  0015                                 mov     r5,r1
0132  16c5 fffc                            sto     r5,r12,-4
0134  17c5 fffc                            ld      r5,r12,-4
0136  3a05 0004                            cmp     r5,r0,4
0138  6c6f                              nz.inc     r15,math16_32-PC
0139  1005 002e                            mov     r5,r0,46
013b  28e5                                 push    r5,r14
013c  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
013d  0c1e                                 inc     r14,1
013e  0c5f                                 inc     r15,math16_33-PC
013f                        math16_32:
013f  1005 0078                            mov     r5,r0,120
0141  28e5                                 push    r5,r14
0142  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
0143  0c1e                                 inc     r14,1
0144                        math16_33:
0144  0041                                 mov     r1,r4
0145  10c2 fffe                            mov     r2,r12,-2     # should be ld r2,r12,-2
0147  190d 020a                            jsr     r13,r0,__mod
0149  0015                                 mov     r5,r1
014a  16c5 fffb                            sto     r5,r12,-5
014c  17c5 fffb                            ld      r5,r12,-5
014e  3a05 000a                            cmp     r5,r0,10
0150  6c6f                              nz.inc     r15,math16_34-PC
0151  1005 002e                            mov     r5,r0,46
0153  28e5                                 push    r5,r14
0154  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
0155  0c1e                                 inc     r14,1
0156  0c5f                                 inc     r15,math16_35-PC
0157                        math16_34:
0157  1005 0078                            mov     r5,r0,120
0159  28e5                                 push    r5,r14
015a  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
015b  0c1e                                 inc     r14,1
015c                        math16_35:
015c  1005 000a                            mov     r5,r0,10
015e  28e5                                 push    r5,r14
015f  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
0160  0c1e                                 inc     r14,1
0161  1005 000d                            mov     r5,r0,13
0163  28e5                                 push    r5,r14
0164  093d                                 jsr     r13,r3        # r3 undefined at this point, should be r0, _putchar
0165  0c1e                                 inc     r14,1
0166  29e4                                 pop     r4,r14
0167  29e3                                 pop     r3,r14
0168  00ce                                 mov     r14,r12
0169  29ec                                 pop     r12,r14
016a  29ed                                 pop     r13,r14
016b  00df                                 mov     r15,r13

Dave


Tue Aug 15, 2017 9:18 am

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
hoglet wrote:
... mov r2,r12,-2 # should be ld r2,r12,-2

Hi Dave,

Do you have a moment to explain the difference between these two instructions? I glanced through the OPC5 specs, but the explanation didn't jump out at me.

Mike B.


Tue Aug 15, 2017 3:22 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1796
If I can jump in usefully - this is one of our common coding errors. Well, mine.
mov r2, r12, -2 is going to compute r12-2 and put the value in r2
ld r2, r12, -2 is going to compute r12-2 and then load from that address into r2

So, the difference is an indirection - the difference between a value and the data at an address.


Tue Aug 15, 2017 3:26 pm

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
Ahh ... thanks Ed. So it might be helpful for someone with an extensive 6xxx background to think of "mov" as "lea", right?

Mike B.


Tue Aug 15, 2017 4:53 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1796
Well, possibly, but that wouldn't work for me! I've never been 100% comfortable with the lea nomenclature. It works when it is an address, but of course it also gets used just to do simple arithmetic.

If the assembler wasn't restricted to being one page of python, it could have more flexible parsing, and allow for a syntax like
Code:
ld r2, r12 - 2    ; presently mov r2, r12, -2
ld r2, [r12, -2]  ; presently ld r2, r12, 02

and maybe also, for other nearby cases:
Code:
mv r2, r12   ; presently mov r2, r12
ld r2, #65   ; presently mov r2, r0, 65

although it's another question whether all register operations should be mv, all should be ld, or if different actions should take different verbs.

It's another way in which the one-page constraint has helped - we don't need to argue over syntax choices!

This particular typo that Rob and I have committed is just something to get used to, I think. You could perhaps create macros to make it easier not to get confused?


Tue Aug 15, 2017 5:13 pm

Joined: Tue Apr 25, 2017 7:33 pm
Posts: 32
It's very true that the lines-of-code constraint drove some of these decisions, but that's not a bad thing. After all the same constraint defined OPC5/6's key feature which is the regularity of the effective address/data calculation (rsrc+imm) for very nearly all instructions in the hardware. Still, it is a little confusing that regular register mov(e) instructions can actually also accomplish some arithmetic between a source and an immediate, and without setting any flags.

This is the simple summary for the various register loading instructions (a slightly more expansive entry is in the table of instructions in the OPC6 spec)

Code:
register move -  mov rdest, rsrc, imm   -   rdest <- rsrc + imm
memory load   -  ld  rdest, rsrc, imm   -   rdest <- mem(rsrc + imm)
memory store  -  sto rdest, rsrc, imm   -   mem(rsrc + imm) <- rdest
IO read       -  in  rdest, rsrc, imm   -   rdest <- IO(rsrc + imm)
IO write      -  out rdest, rsrc,imm    -   IO(rsrc + imm) <- rdest


IO is treated separately from memory as on the Z80, so there is a 64K word IO space as well as the 64K word memory space.

R


Tue Aug 15, 2017 5:43 pm
 [ 77 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6  Next

Who is online

Users browsing this forum: CCBot and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software