Last visit was: Tue Sep 10, 2024 10:23 am
|
It is currently Tue Sep 10, 2024 10:23 am
|
Porting the C64 compiler to target OPC5 (or OPC6)
Author |
Message |
hoglet
Joined: Tue Feb 10, 2015 7:07 am Posts: 52
|
robfinch wrote: There's a fix for bugs 16,17 now. The fix caused a fair amount of change to the code generated in my test files. Generates smaller code with the fix to boot. I hope I didn't break something else with the fix. I've just been comparing output from one set of test files to the next generation and reviewing the changes. Usually it's only a couple of lines that change. Well, the writing to the wrong address is gone, but something is amiss... Code: *go 100 @ABCD 1234 5678 3.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 *
(unless the value of Pi has actually changed recently) I'll look into this tomorrow, as we have some friends coming round to supper soon. Dave
|
Sat Aug 12, 2017 6:01 pm |
|
|
hoglet
Joined: Tue Feb 10, 2015 7:07 am Posts: 52
|
OK, the fun continues 18. Result of multiply being overwritten before it is usedI don't have a small example for this, but here's a fragment from pi.cCode: for (i = len; i > 0; i--) { x = 10 * pi[i]+ q * i; pi[i] = x % (2 * i - 1); q = x / (2 * i - 1); }
The bit that is going wrong is evaluation of the expression for x: Code: 0190 10c7 fea8 mov r7,r12,-344 0192 0036 mov r6,r3 0193 0476 add r6,r7 0194 0766 ld r6,r6 0195 1007 000a mov r7,r0,10 0197 0061 mov r1,r6 0198 0072 mov r2,r7 0199 190d 0329 jsr r13,r0,__mul 019b 17c6 fff9 ld r6,r12,-7 019d 0061 mov r1,r6 # the result of the multiply (r1) is being overwritten before it has been used 019e 0032 mov r2,r3 019f 190d 0329 jsr r13,r0,__mul 01a1 0015 mov r5,r1 01a2 0415 add r5,r1 01a3 0054 mov r4,r5
Dave
|
Sun Aug 13, 2017 9:33 am |
|
|
hoglet
Joined: Tue Feb 10, 2015 7:07 am Posts: 52
|
One more that I noticed, not (yet) affecting the Pi program. 19. long maths seems completely broken nowCode: void bug19(long a, long b) { long c = a * b; }
Code: _bug19: push r13,r14 push r12,r14 mov r12,r14 dec r14,2 ld r5,r12,2 ld r6,r12,3 ld r7,r12,4 ld r3,r12,5 mov r1,r5 # wrong number of parameters mov r2,r7 jsr r13,r0,__mul # should be calling __mul32 sto r1,r12,-2 sto # missing registers mov r14,r12 pop r12,r14 pop r13,r14 mov r15,r13
|
Sun Aug 13, 2017 9:41 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2157 Location: Canada
|
Yes, I broke the long math.
When the compiler doesn’t have enough temporaries available it attempts to re-use an existing temporary by pushing it on the stack. Unfortunately the compiler is trying to use the same temporary register for two different purposes at the same time. It looks like I may have to re-write the temporaries section of the compiler to use pseudo registers located on the stack when it runs out of temporaries. Some expressions may work for now by adding two more registers to the available pool (r8, r9), but it means the function can’t use register arguments because the args are used as temporaries.
Working on it has been like working on a finger puzzle. I thought I had it solved a couple of times. But it’s taken me longer than anticipated. Because the compiler is using pushes and pops there must be a strict order adhered to when it comes to accessing temporaries in compiler code. I hope to re-write this so it doesn’t matter. It may help to break up expressions into simpler pieces. I believe the compiler reinitializes the temporary pool for each expression.
_________________Robert Finch http://www.finitron.ca
|
Mon Aug 14, 2017 1:29 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2157 Location: Canada
|
Here is a sample of x = 10 * pi[I] + q * I in long math. It looks like it worked using r8 and r9 as additional temporaries. However I've no doubt that a complex expression won't work properly (yet). Code: # code _main: # long main() push r13,r14 push r12,r14 mov r12,r14 # long pi[333]; sub r14,r0,672 # x = 10 * pi[i] + q * i; ld r7,r12,-669 ld r6,r12,-670 mov r3,r12,-666 mov r5,r6 add r5,r3 ld r6,r5,1 ld r5,r5 mov r1,r5 mov r2,r6 push r3,r14 push r4,r14 mov r3,r0,10 mov r4,r0,r0 jsr r13,r0,__mul32 pop r4,r14 pop r3,r14 mov r5,r1 mov r6,r2 ld r3,r12,-671 ld r7,r12,-672 ld r8,r12,-669 ld r4,r12,-670 mov r1,r7 mov r2,r3 push r3,r14 push r4,r14 mov r3,r4 mov r4,r8 jsr r13,r0,__mul32 pop r4,r14 pop r3,r14 mov r7,r1 mov r3,r2 mov r1,r5 mov r2,r6 add r1,r7 adc r2,r3 mov r5,r1 mov r6,r2 sto r5,r12,-668 sto r6,r12,-667 mov r14,r12 pop r12,r14 pop r13,r14 mov r15,r13
# rodata # global _main
_________________Robert Finch http://www.finitron.ca
|
Mon Aug 14, 2017 1:50 am |
|
|
hoglet
Joined: Tue Feb 10, 2015 7:07 am Posts: 52
|
Hi Rob, Just a quick note to say that with the temporaries change, the Pi Spigot test program now works corrects (at 100 digits): Code: *go 100 @ABCD 1234 5678 3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067
This is excellent progress. Do keep us posted as you continue to chip away at this. I appreciate how tricky this is getting, and I'm more than happy to keep testing builds if that helps. Dave
|
Mon Aug 14, 2017 12:19 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2157 Location: Canada
|
The compiler now makes two passes at code generation in order to determine whether it has to use temporaries located on the stack instead of in registers. An option was added “-fno_regs” to specify not to use any registers for temporary values. It can also be passed as a function attribute to force the use of stack variables. I’m actually using the OPC6 compiler as a reference for another compiler I’m developing. The OPC6 compiler is somewhat simpler because it only supports word size operands.
_________________Robert Finch http://www.finitron.ca
|
Mon Aug 14, 2017 7:04 pm |
|
|
hoglet
Joined: Tue Feb 10, 2015 7:07 am Posts: 52
|
Hi Rob, robfinch wrote: The compiler now makes two passes at code generation in order to determine whether it has to use temporaries located on the stack instead of in registers. An option was added “-fno_regs” to specify not to use any registers for temporary values. It can also be passed as a function attribute to force the use of stack variables. I’m actually using the OPC6 compiler as a reference for another compiler I’m developing. The OPC6 compiler is somewhat simpler because it only supports word size operands. I've just tried the latest compiler code with the Pi Spigot. It compiles without any errors, but the generated code is missing certain labels. For example the _putchar and _main labels are not output (but the code is there). In fact, the only labels present are the pi_<number> ones. Here is the code I'm testing with: https://github.com/hoglet67/opc/blob/c_ ... got-c/pi.cDave
|
Mon Aug 14, 2017 7:52 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2157 Location: Canada
|
Okay, the labels should be back now. I added a couple of more files to the compiler. Moved function prototypes out to a file called proto.h.
_________________Robert Finch http://www.finitron.ca
|
Tue Aug 15, 2017 1:45 am |
|
|
hoglet
Joined: Tue Feb 10, 2015 7:07 am Posts: 52
|
Hi Rob, Not much luck with the Pi program at the moment (C64 compiler commit 8c8874e). Here's a smaller test case that also fails. 21. Math16.c issuesCode: void init() { asm { ORG 0x100 mov r15, r0, _main }; }
void putchar(char c) { asm __leafs { ld r1, r14, 1 jsr r13, r0, 0xffee }; }
void main() { int a,b,c,d,e; a = 234; b = 56; c = a * b; if (c == 13104) { putchar('.'); } else { putchar('x'); } d = a / b; if (d == 4) { putchar('.'); } else { putchar('x'); } e = a % b; if (e == 10) { putchar('.'); } else { putchar('x'); } putchar(10); putchar(13); }
Code: 0000 100f 010a mov r15,r0,_main 0002 # code 0002 _init: 0002 0002 ORG 0x100 0100 100f 010a mov r15, r0, _main 0102 00df mov r15,r13 0103 0103 0103 _putchar: 0103 28ed push r13,r14 0104 0104 17e1 0001 ld r1, r14, 1 0106 190d ffee jsr r13, r0, 0xffee 0108 29ed pop r13,r14 0109 00df mov r15,r13 010a _main: 010a 28ed push r13,r14 010b 28ec push r12,r14 010c 00ec mov r12,r14 010d 0e6e dec r14,6 010e 1004 00ea mov r4,r0,234 0110 1005 0038 mov r5,r0,56 0112 16c5 fffe sto r5,r12,-2 0114 0041 mov r1,r4 0115 17c2 fffe ld r2,r12,-2 0117 190d 01da jsr r13,r0,__mul 0119 0015 mov r5,r1 011a 16c5 fffd sto r5,r12,-3 011c 17c5 fffd ld r5,r12,-3 011e 3a05 3330 cmp r5,r0,13104 0120 6c6f nz.inc r15,math16_30-PC 0121 1005 002e mov r5,r0,46 0123 28e5 push r5,r14 0124 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 0125 0c1e inc r14,1 0126 0c5f inc r15,math16_31-PC 0127 math16_30: 0127 1005 0078 mov r5,r0,120 0129 28e5 push r5,r14 012a 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 012b 0c1e inc r14,1 012c math16_31: 012c 0041 mov r1,r4 012d 10c2 fffe mov r2,r12,-2 # should be ld r2,r12,-2 012f 190d 01f2 jsr r13,r0,__div 0131 0015 mov r5,r1 0132 16c5 fffc sto r5,r12,-4 0134 17c5 fffc ld r5,r12,-4 0136 3a05 0004 cmp r5,r0,4 0138 6c6f nz.inc r15,math16_32-PC 0139 1005 002e mov r5,r0,46 013b 28e5 push r5,r14 013c 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 013d 0c1e inc r14,1 013e 0c5f inc r15,math16_33-PC 013f math16_32: 013f 1005 0078 mov r5,r0,120 0141 28e5 push r5,r14 0142 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 0143 0c1e inc r14,1 0144 math16_33: 0144 0041 mov r1,r4 0145 10c2 fffe mov r2,r12,-2 # should be ld r2,r12,-2 0147 190d 020a jsr r13,r0,__mod 0149 0015 mov r5,r1 014a 16c5 fffb sto r5,r12,-5 014c 17c5 fffb ld r5,r12,-5 014e 3a05 000a cmp r5,r0,10 0150 6c6f nz.inc r15,math16_34-PC 0151 1005 002e mov r5,r0,46 0153 28e5 push r5,r14 0154 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 0155 0c1e inc r14,1 0156 0c5f inc r15,math16_35-PC 0157 math16_34: 0157 1005 0078 mov r5,r0,120 0159 28e5 push r5,r14 015a 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 015b 0c1e inc r14,1 015c math16_35: 015c 1005 000a mov r5,r0,10 015e 28e5 push r5,r14 015f 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 0160 0c1e inc r14,1 0161 1005 000d mov r5,r0,13 0163 28e5 push r5,r14 0164 093d jsr r13,r3 # r3 undefined at this point, should be r0, _putchar 0165 0c1e inc r14,1 0166 29e4 pop r4,r14 0167 29e3 pop r3,r14 0168 00ce mov r14,r12 0169 29ec pop r12,r14 016a 29ed pop r13,r14 016b 00df mov r15,r13
Dave
|
Tue Aug 15, 2017 9:18 am |
|
|
barrym95838
Joined: Tue Dec 31, 2013 2:01 am Posts: 116 Location: Sacramento, CA, United States
|
hoglet wrote: ... mov r2,r12,-2 # should be ld r2,r12,-2 Hi Dave, Do you have a moment to explain the difference between these two instructions? I glanced through the OPC5 specs, but the explanation didn't jump out at me. Mike B.
|
Tue Aug 15, 2017 3:22 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1796
|
If I can jump in usefully - this is one of our common coding errors. Well, mine. mov r2, r12, -2 is going to compute r12-2 and put the value in r2 ld r2, r12, -2 is going to compute r12-2 and then load from that address into r2
So, the difference is an indirection - the difference between a value and the data at an address.
|
Tue Aug 15, 2017 3:26 pm |
|
|
barrym95838
Joined: Tue Dec 31, 2013 2:01 am Posts: 116 Location: Sacramento, CA, United States
|
Ahh ... thanks Ed. So it might be helpful for someone with an extensive 6xxx background to think of "mov" as "lea", right?
Mike B.
|
Tue Aug 15, 2017 4:53 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1796
|
Well, possibly, but that wouldn't work for me! I've never been 100% comfortable with the lea nomenclature. It works when it is an address, but of course it also gets used just to do simple arithmetic. If the assembler wasn't restricted to being one page of python, it could have more flexible parsing, and allow for a syntax like Code: ld r2, r12 - 2 ; presently mov r2, r12, -2 ld r2, [r12, -2] ; presently ld r2, r12, 02 and maybe also, for other nearby cases: Code: mv r2, r12 ; presently mov r2, r12 ld r2, #65 ; presently mov r2, r0, 65 although it's another question whether all register operations should be mv, all should be ld, or if different actions should take different verbs. It's another way in which the one-page constraint has helped - we don't need to argue over syntax choices! This particular typo that Rob and I have committed is just something to get used to, I think. You could perhaps create macros to make it easier not to get confused?
|
Tue Aug 15, 2017 5:13 pm |
|
|
Revaldinho
Joined: Tue Apr 25, 2017 7:33 pm Posts: 32
|
It's very true that the lines-of-code constraint drove some of these decisions, but that's not a bad thing. After all the same constraint defined OPC5/6's key feature which is the regularity of the effective address/data calculation (rsrc+imm) for very nearly all instructions in the hardware. Still, it is a little confusing that regular register mov(e) instructions can actually also accomplish some arithmetic between a source and an immediate, and without setting any flags. This is the simple summary for the various register loading instructions (a slightly more expansive entry is in the table of instructions in the OPC6 spec) Code: register move - mov rdest, rsrc, imm - rdest <- rsrc + imm memory load - ld rdest, rsrc, imm - rdest <- mem(rsrc + imm) memory store - sto rdest, rsrc, imm - mem(rsrc + imm) <- rdest IO read - in rdest, rsrc, imm - rdest <- IO(rsrc + imm) IO write - out rdest, rsrc,imm - IO(rsrc + imm) <- rdest
IO is treated separately from memory as on the Z80, so there is a 64K word IO space as well as the 64K word memory space. R
|
Tue Aug 15, 2017 5:43 pm |
|
Who is online |
Users browsing this forum: CCBot, DotBot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|