View unanswered posts | View active topics It is currently Thu Apr 18, 2024 9:36 am



Reply to topic  [ 64 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
 nPower 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
After several hardware fixes and several software work-arounds the demo can at least clear the screen.
Indexed addressing using pointers does not seem to compiler properly. It leaves out the scaling of the index register and the index register load.
Code like the following does not work.
Code:
   for (n = 0; n < 56 *31; n = n + 1)
      pScreen[n] = DBGAttr|' ';

Changing the code to use pointer incrementation seems to work:
Code:
      pScreen = 0xFFD00010;
      for (n = 0; n < 56*31; n = n + 1)
         *pScreen++ = my_rand();

_________________
Robert Finch http://www.finitron.ca


Sun Sep 19, 2021 4:59 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I started working on my own PowerPC compiler.
The PowerPC compiler is coming along. It does not yet generate code good enough to execute but it is getting there. The compiler is being modified to output a syntax that vasm and vlink can digest. While writing the compiler the plan is to use the existing assembler and linker from vbcc.

I’ve been playing with the sprite controller lately.

_________________
Robert Finch http://www.finitron.ca


Fri Sep 24, 2021 4:44 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The compiler code output looks almost good enough to try executing. While code generation appears good, there is still an issue with generation of variables in the data, bss and rodata segments.
There is also an issue with code generation quitting too soon unless optimization is turned on, which seems a bit strange to me. If optimization is on the entire code is generated, off and it quits partway through the generation.
Following is a simple example of code generation.

Code:
int my_abs(register int a)
{
   if (a < 0) a = -a;
   return (a);
}

Code:
   .text
   .align   4

   .global _my_abs
   .align 4

#====================================================
# Basic Block 0
#====================================================
_my_abs:
  cmpwi    cr0,r3,0
  bge      cr0,.C00013
  neg      r3,r3
.C00013:
.C00012:
  blr   
   .type _my_abs,@function
   .size _my_abs,$-_my_abs
# stacksize=48
   .set ___stack_my_abs,48

_________________
Robert Finch http://www.finitron.ca


Sat Sep 25, 2021 4:53 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Fixes:
Hardware: the target and source register were swapped around for the SRAWI instruction which led to an infinite loop as the wrong register got updated. For this version of the PowerPC for some instructions the Ra source register and Rt target register fields are swapped. This has led to some confusion in the past.
The mask begin and mask end fields were processed in the wrong bit order. PowerPC encodes the most significant bit as bit 0, the least significant as bit 31. This led incorrect masking of shifts. Mask generation was also incorrect for some cases.
Shift and rotate operations were using the wrong pipeline ir to determine the shift and mask amounts.
Software: Loading and storing the link register needed to be done indirectly through another register as a direct load or store of the link register is not supported on PowerPC. This led to a hang on a return from a non-leaf routine.
Additions: pipeline loop mode was added. Up to seven instructions in the pipeline may be circulated without instruction fetches.
Modifications: renamed the project rfPower

_________________
Robert Finch http://www.finitron.ca


Wed Sep 29, 2021 5:13 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Issues with the link register bypass network were resolved.
The stack is messed up. The return address is stored at $FFFCFF8C but readback of the address is done at $FFFCFF84. I cannot find where the difference of 8 is coming from. Function entry and exit code looks good. This is an issue because it is causing a two-up return to the program exit point. This happens with non-leaf functions. I am getting the compiler to insert a couple of extra NOP instructions around the load and store of the link register. My thinking is that it may be a link register forwarding issue and the extra NOPs should bypass the forwarding.

The compiler was not outputting any storage directive for .bss variables. This led to all the variables being stacked up at address zero causing software issues. It really needs an .lcomm directive but for now it just outputs a .byte which works with a warning message spit out by the linker.

_________________
Robert Finch http://www.finitron.ca


Thu Sep 30, 2021 4:01 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Had lots of fun getting switch cases to work properly. They were outputting multiple copies of code for cases. This led to thousands of lines of code being output for a simple switch.
Hardware fixes: the bits in the condition register were reversed in order. Software worked only because decoding was also in the reverse order. The bits have now been set to the order of the PowerPC.
Software fixes: the compiler was always outputting an ADDI instruction for small constants when it should have been an ADDI with a negative constant to do a subtract. This led to the sprite demo always moving sprites down towards the right instead of in a variety of directions.

Conditional logic, if statements, does not appear to be working correctly. The sprites bounced back and forth about the same location as direction toggled due to an if always evaluating to true; the branch is always false. Conditional logic for ‘for’ loops seems to work, so it is looking like a compile error. For loops are working but they are comparing a register against an immediate value, if statements are comparing a value loaded from memory against an immediate. There could be an issue with forwarding logic for the value from memory. I have studied both the compiler and Verilog rtl code for potential issues. Nothing obvious appears. So, I put a couple of NOP instructions before the CMPW removing the need for bypassing. If that causes things to work then it is known to be a bypass issue.

_________________
Robert Finch http://www.finitron.ca


Sun Oct 03, 2021 5:32 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Found and fixed a software issue that was causing a two-up return from subroutine. The compiler was outputting the stack unlink code in the wrong order. After the fix subroutine call and return seems to work fine.
I am looking for a fast way to emulate the SET instructions using PowerPC code. A simple set equal to zero takes about five instructions. A sequence of instructions like the following is required:
Code:
cmpwi cr0,r11,0
mfcr r11
srwi r11,r11,29
andi. r11,r11,1
xori r11,r11,1

_________________
Robert Finch http://www.finitron.ca


Tue Oct 05, 2021 5:20 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The core works well enough now to display single characters with the DBGDisplayChar() routine. Displaying strings does not work yet. There is an issue with getting the correct endian for stored data. The assembler outputs data in big-endian. PowerPC is great because the endian is controlled by a bit in the machine status register. The default operation of the core is little endian. When operating in big-endian mode it takes an extra clock cycle on a load to swap the endian around.

_________________
Robert Finch http://www.finitron.ca


Wed Oct 06, 2021 8:07 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Found yet another software bug. When passing arguments to a function, to push the argument the stack pointer was being decrement using a ‘sub r1,r1,4’ instruction. A couple of issues with this. The PowerPC assembler does not automatically assume immediate values. So, it thought it was a register-register subtract. The instruction needed to be ‘addi r1,r1-4’. A quick fix to the compiler was made.

Using arguments passed in registers is a bit overrated. It turns out that if the function with register arguments needs to call another function then the arguments need to be saved on the stack before the call and restored afterwards. In other words, register based arguments end up being stored on the stack anyway, so why not just use stack based arguments? The only case where register arguments are better occurs when calling a leaf routine.

_________________
Robert Finch http://www.finitron.ca


Thu Oct 07, 2021 4:50 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 589
Most leaf functions I can think of are like a block move. 5 registers.
2 pointer registers 1 return register 1 counter register and temp register
char * move(d,s,c) char *d,*s;int c;
{ char *r;
r=d;
while(c--) *d++=*s++;
return r;
}

Anything bigger is a real function, that needs a full stack.
What I can see more usefull is bound check registers for
stacks and memory functions as a software check.

big recusive program(stuff)
... if (stack>= toobig) clean up and exit
else big recursive program(more-stuff)


Sun Oct 10, 2021 2:06 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Kinda stuck on this project at the moment. The core or software is not managing the link register correctly causing an infinite loop due to a return to the wrong point in a program. This has led to a continuous stream of 'A0' characters while trying to dump values to the screen. Things almost but do not quite work fullly yet. I am unable to run tests in SIM as LLVM keeps running out of memory. So I decided to move onto the Thor2021 project for now.

_________________
Robert Finch http://www.finitron.ca


Tue Oct 12, 2021 7:17 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
I remember one time at work the commercial synthesis program had some bug where memory usage blew up - we commissioned a very large machine, possibly a 1024G machine, so the program would at least run to completion so we could get support on a test case...


Thu Oct 14, 2021 3:53 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Another case where the ‘just get a bigger machine’ paradigm worked. Something I have thought about.
1024GB? That’s 40+ bit addressing.

I am not sure that LLVM is running out of memory. It says it is, but SIM can run a larger more complex project just fine. But it dies on a project ¼ the size. So it smells like something else is wrong.
I have a machine with 16G but it does not seem to use more than 8GB for Vivado. I wonder if there is an 8GB limit? I have thought about upgrading the memory, but if there is a limit as to how much memory Vivado will use it may not do any good.

_________________
Robert Finch http://www.finitron.ca


Fri Oct 15, 2021 5:52 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Hmm, yes, 40 bits. This was in the days of x86 and Xeon, dual or quad socket computers, as far as I recall. I think PAE might be the term whereby x86/Pentium got beyond 32 bit addressing? But then AMD64 came along, and this must have been past that point, so perhaps PAE was a distant memory.

Sometimes of course you can just add enormous amounts of swap space and the program can then allocate what it likes - it might not actually use it heavily.


Fri Oct 15, 2021 8:00 am
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 589
Video memory could be a hidden factor.
40 bits virtual memory is more memory than all the C-64's ever sold.
Ben.


Sun Oct 17, 2021 4:33 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 64 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software