View unanswered posts | View active topics It is currently Tue Mar 19, 2024 10:02 am



Reply to topic  [ 45 posts ]  Go to page Previous  1, 2, 3  Next
 Some C compilers 
Author Message

Joined: Mon Oct 07, 2019 1:26 pm
Posts: 46
Hi Joan, Robinson, thank you for your replies.

Joan, yes my code generator does not do any optimizing yet. Everything simply gets allocated in the workspace area.
A pity that all code is optimized away, this leaves almost nothing to compare.
Perhaps I must use another demo code that can not be optimized so much.


Robinsonb5, your assembly code takes some practice to understand. I read through your postings and I understand that it is this way because you can assemble this to machine code without having a dedicated assembler. Personally I think PDP-11 or 68000 code is easier to read, that's why I choose pdp-11 lookalike instructions for the Kobold assembly. In your case I would rename temp to ACC and have explicit instructions like MOV (R2),ACC

It is surprising to see that your CPU uses up to 2 times more instructions as Kobold to do the same thing. And the Kobold code has not even been optimized. But since you're using an FPGA there is plenty of speed so I guess it's not a problem for you.
I like your endevour to get performance out of 8-bit instructions. But the single 'temp' accumulator seems to be a bottleneck. Did you think about using a small stack of accumulators (8,16 or 32 of them) ?
Good that your C compiler doesn't optimize (as Joan's does), because that would also optimize everything away and leave nothing to compare.


Sun Dec 29, 2019 9:44 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
roelh wrote:
Hi Joan, Robinson, thank you for your replies.
Joan, yes my code generator does not do any optimizing yet. Everything simply gets allocated in the workspace area.
A pity that all code is optimized away, this leaves almost nothing to compare.
Perhaps I must use another demo code that can not be optimized so much.

Hi Roelh, LLVM optimises away load/stores to local variables that have not an effect outside the function, or that can be evaluated as constant expressions. The trick to prevent this is to use function arguments that are actually used to compute return values, or to update global vars, or to update arguments passed by reference. Also, small functions within the visibility scope of the compiler (namely on the same C source file, or with known bodies in included header files) tend to be inlined so they are not called at all, but this can be explicitly prevented if required by applying the 'noinline' attribute to these functions. I made some changes to your code to show this, and this is the result:

Modified C code
Code:
__attribute__((noinline))
int sum (int num1, int num2)
{
    return num1+num2;
}
int (*f2p) (int, int);
int n;

int f(int op1, int *op2)
{
    *op2 = sum(op1,n);
    f2p = sum;
    //Calling a function using function pointer
    op1 = (*f2p)(10, 13);
    return op1;
}

// demo structure member, array element, and if statement

char ar[8];

struct tt {
  int r1    ;
  int r2;
};

int i;

void test(struct tt* p){
if (i==5)
  { i = p-> r2; }
else ar[6] = i;
}


CPU74

Code:
# ---------------------------------------------
# sum
# ---------------------------------------------
   .globl   sum
sum:
   add   r1, r0, r0
   ret

# ---------------------------------------------
# f
# ---------------------------------------------
   .globl   f
f:                        // function enters with arguments in r0 and r1
   add   SP, -2, SP          // make room on the stack for callee register saving (this is equivalent to 'push r4', but I removed push/pop instructions from the set)
   st.w   r4, [SP, 0]       // save r4 to the stack (required as per the calling conventions)
   mov   r1, r4              // second argument (it's a pointer, but this does not matter at this point) goes to r4
   ld.w   [&n], r1          // load r1 with the contents of global 'n'
   call   @sum              // call sum with r0 and r1 as arguments
   st.w   r0, [r4, 0]       // store the sum retun value to the address pointed to by r4
   mov   &sum, r0            // load the address of sum in r0
   st.w   r0, [&f2p]        // store the address of sum into global var f2p
   mov   10, r0              // set r0 to 10
   mov   13, r1              // set r1 to 13
   call   @sum              // call sum with arguments in r0 and r1, result goes to r0
   ld.w   [SP, 0], r4       // restore r4 ( this, along with the next instruction is equivalent to 'pop r4')
   add   SP, 2, SP           // restore the previous frame
   ret                     // return with the result in r0

# ---------------------------------------------
# test
# ---------------------------------------------
   .globl   test
test:
   ld.w   [&i], r1
   cmp.ne   r1, 5
   brcc   .LBB2_2
   ld.w   [r0, 2], r0
   st.w   r0, [&i]
   ret
.LBB2_2:
   st.b   r1, [&ar+6]
   ret

# ---------------------------------------------
# Global Data
# ---------------------------------------------
   .comm   n,2,2
   .comm   f2p,2,2
   .comm   i,2,2
   .comm   ar,8,2





If I do not apply the noinline attribute to the sum function then I get what it would be more natural for LLVM, which is this:

Code:
# ---------------------------------------------
# sum
# ---------------------------------------------
   .globl   sum
sum:
   add   r1, r0, r0
   ret

# ---------------------------------------------
# f
# ---------------------------------------------
   .globl   f
f:                        // function enters with arguments in r0 and r1 (r1 is a pointer)
   ld.w   [&n], r2          // load variable n in r2
   add   r2, r0, r0          // add to first argument (this is the inlined sum function)
   st.w   r0, [r1, 0]       // store the sum retun value to the address pointed to by r1
   mov   &sum, r0            // load the address of sum in r0
   st.w   r0, [&f2p]        // store the address of sum into global var f2p
   mov   23, r0              // this again is the inlined call to sum, this time resulting in a constant value
   ret                     // return r0

# ---------------------------------------------
# test
# ---------------------------------------------
   .globl   test
test:
   ld.w   [&i], r1
   cmp.ne   r1, 5
   brcc   .LBB2_2
   ld.w   [r0, 2], r0
   st.w   r0, [&i]
   ret
.LBB2_2:
   st.b   r1, [&ar+6]
   ret

# ---------------------------------------------
# Global Data
# ---------------------------------------------
   .comm   n,2,2
   .comm   f2p,2,2
   .comm   i,2,2
   .comm   ar,8,2



Now, the sum function is inlined and the result is much faster and shorter code.

In any of the examples, the function pointer is still not seen to actually being used to call a function. This is because the compiler knows which function the pointer points to at compile time, so it just generates code to call the function directly. For the compiler to actually use a function pointer, it must be dereferenced in a scope where the target function is not known. For example if the pointer is set on a function, and used on a different one, or if the pointer is passed as an argument.

I hope this is of interest.

Joan


Sun Dec 29, 2019 12:18 pm
Profile

Joined: Mon Oct 07, 2019 1:26 pm
Posts: 46
Hi Joan, I took your modified code as the new demo code. Also modified the array access to show some real indexing (with I).

For function f, the difference is not very great, with 18 for Kobold and 14 for CPU74. Other functions will also improve when I put some optimizations in the generated code.

It is tempting to work on the code generator, but I have an empty Kobold PCB waiting to be populated...


Sun Dec 29, 2019 1:06 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
roelh wrote:
Hi Joan, I took your modified code as the new demo code. Also modified the array access to show some real indexing (with I).

For function f, the difference is not very great, with 18 for Kobold and 14 for CPU74. Other functions will also improve when I put some optimizations in the generated code.

It is tempting to work on the code generator, but I have an empty Kobold PCB waiting to be populated...


Indeed, the difference is not that great. One aspect that slightly hurts the CPU74 ISA is the lack of push/pop instructions. The ISA defines that up to 4 function arguments are passed on registers (r0 to r3) and the remaining ones (or any varargs) are passed on the stack. Functions are not meant to preserve registers r0 to r3, but must preserve r4 to r7 (The SP and PC are not part of the general purpose set). Any function can potentially modify r0...r3, so these registers can not be used across calls, instead r4 is used in your code example. But r4 is one of the 'must preserve' registers, so that's why it must be saved/restored from the stack.

The lack of push/pop means 4 instructions in total just for saving/restoring one register, and that's why inlining greatly benefits this particular case. On the more general case, the penalty is not that big because a stack frame must be generally allocated anyway for local variables or register spills, or a greater number of registers must be saved. In some cases, implementing the 'tail call' optimisation would also help, but I have not gone that far...

I think you will have to consider the same kind of balance when you define what to do with registers across function calls, should them all be preserved?, only some of them?, or none of them? In all cases this adds more code generated in order to adhere with the chosen convention. Your current approach seems to be to store all arguments to the local function frame, which is fine, so you can access arguments across function calls, and probably you assume that registers are not preserved by called functions. Depending on the available memory addressing modes, this may or may not reduce code as compared with making a more extensive use of registers. My gut feeling is that pure load/store architectures benefit more from saved/restored registers performed once per function call, than a potentially bigger number of memory accesses. But I guess this is highly debatable and dependent on the actual architecture.


Sun Dec 29, 2019 2:18 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(Is this something one could explore and tune? Recompile the same benchmarks with the compiler set up to honour different calling conventions and see how many registers it's best to use for scratch, how many for parameters, how many to preserve?)


Sun Dec 29, 2019 3:01 pm
Profile

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
roelh wrote:
Robinsonb5, your assembly code takes some practice to understand. I read through your postings and I understand that it is this way because you can assemble this to machine code without having a dedicated assembler. Personally I think PDP-11 or 68000 code is easier to read, that's why I choose pdp-11 lookalike instructions for the Kobold assembly.


68000 was the first assembly language I learned, so it's naturally the one I consider easiest to read!

Quote:
In your case I would rename temp to ACC and have explicit instructions like MOV (R2),ACC


I originally intended that register to be an accumulator, i.e. the target of ALU instructions, but actually I ended up with ALU instructions writing their results to the nominated register instead, with tmp simply providing the implicit second operand, so I'm not sure ACC is strictly appropriate. It would certainly be clearer to be specify the second operand explicitly, though.

Quote:
It is surprising to see that your CPU uses up to 2 times more instructions as Kobold to do the same thing. And the Kobold code has not even been optimized.


Yes, though the code currently emitted by the compiler is functional but far from optimal - I've found that by hand-coding I can reduce code size by anything between a quarter and two thirds - for exactly the reasons that prompted you to start this project. So I shall be watching its progress with interest!

Quote:
But since you're using an FPGA there is plenty of speed so I guess it's not a problem for you.


True - an Altera DE2 board can run this comfortably at 133MHz.

Quote:
I like your endevour to get performance out of 8-bit instructions. But the single 'temp' accumulator seems to be a bottleneck.


Yes, it is - especially since I haven't yet implemented any kind of result forwarding.

Quote:
Did you think about using a small stack of accumulators (8,16 or 32 of them)?


Yes, I considered that, and it would be interesting to explore. I haven't yet because this project doesn't exist purely for its own sake - I have an application in mind for it: a project where there's next to no block RAM left, so I need a CPU with small logic footprint, no block RAM used for a register file and good code density so the ROM doesn't take up too much block RAM either. I'm hoping that I can improve the code generator to the point where the last requirement is met!


Sun Dec 29, 2019 8:52 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I figured I'd share the output from my compiler. The compiler isn't smart enough to use registers automatically, if one wants to use registers the register keyword must be used. Register arguments can generally be used only for leaf routines.
I modified the test program slightly:
Code:
int sum (register int num1, register int num2)
{
    return num1+num2;
}
int (*f2p) (register int, register int);
int n;

int f(int op1, int *op2)
{
    *op2 = sum(op1,n);
    f2p = sum;
    //Calling a function using function pointer
    op1 = (*f2p)(10, 13);
    return op1;
}

// demo structure member, array element, and if statement

char ar[8];

struct tt {
  int r1    ;
  int r2;
};

int i;

void test(register struct tt* p){
if (i==5)
  { i = p-> r2; }
else ar[6] = i;
}

Compiler output isn't too bad, but it's no match for LLVM.
Code:
   code
   align   8
;====================================================
; Basic Block 0
;====================================================
public code _sum:
            add         $v0,$a0,$a1
TestC74_8:
            ret         $lk1
endpublic



   bss
   align   2
public bss _n:
   fill.b   4,0x00

endpublic
   code
   align   8
;====================================================
; Basic Block 0
;====================================================
public code _f:
            sub         $sp,$sp,#16
            st          $fp,[$sp]
            st          $r0,4[$sp]
            mov         $r23,$lk1
            st          $r23,12[$sp]
            mov         $fp,$sp
            sub         $sp,$sp,#16
            st          $r11,0[$sp]
            ld          $r11,16[$fp]
;     *op2 = sum(op1,n);
            ld          $t0,20[$fp]
            st          $t0,-8[$fp]
            ld          $a1,_n
            mov         $a0,$r11
            jal         $lk1,_sum
            ld          $t0,-8[$fp]
            mov         $t1,$v0
            st          $t1,[$t0]
;     f2p = sum;
            ldi         $t0,#_sum
            st          $t0,_f2p
;     op1 = (*f2p)(10, 13);
            ld          $t0,_f2p
            st          $t0,-8[$fp]
            ldi         $a1,#13
            ldi         $a0,#10
            jal         $lk1,[$t0]
            mov         $t0,$v0
            mov         $r11,$t0
;     return op1;
            mov         $v0,$r11
TestC74_13:
TestC74_16:
            ld          $r11,0[$sp]
            mov         $sp,$fp
            ld          $fp,[$sp]
            ld          $r23,12[$sp]
            mov         $lk1,$r23
            add         $sp,$sp,#24
            ret         $lk1
endpublic



   bss
   align   8
   align   8
   dw   $FFF0200000000001
public bss _ar:
   fill.b   8,0x00

endpublic
   align   2
public bss _i:
   fill.b   4,0x00

endpublic
   code
   align   8
;====================================================
; Basic Block 0
;====================================================
public code _test:
; if (i==5)
            ld          $v0,_i
            cmp         $cr2,$v0,#5
            bne         $cr2,TestC74_27
;   { i = p-> r2; }
            ld          $v0,4[$a0]
            st          $v0,_i
            bra         TestC74_28
TestC74_27:
; else ar[6] = i;
            ldb         $v0,_i
            stb         $v0,_ar+6
TestC74_28:
TestC74_23:
TestC74_26:
            ret         $lk1
endpublic



   rodata
   align   16
;   global   _test
;   global   _ar
;   global   _f2p
;   global   _sum
;   global   _f
;   global   _i
;   global   _n

_________________
Robert Finch http://www.finitron.ca


Mon Dec 30, 2019 6:57 am
Profile WWW

Joined: Mon Oct 07, 2019 1:26 pm
Posts: 46
Hi Rob, thanks for sharing you compiler output.

Is this the C64 compiler that I found on your website ? I tried to look at C64.zip but I got a 404 error. (on this page: http://www.finitron.ca/Projects/C64.html ).

For which processor is this compiled ?

$lk1, $t0, $t1, $v0 did puzzle me. Are these alias names for certain registers ?

I saw that when a stack frame is needed, the entry and exit code is quite large...


Mon Dec 30, 2019 9:34 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Is this the C64 compiler that I found on your website ? I tried to look at C64.zip but I got a 404 error. (on this page: http://www.finitron.ca/Projects/C64.html ).

No, it's a much more recent version with a lot of fixes. I should really update the website.

Quote:
For which processor is this compiled ?

The compile is for a custom processor I'm working on. It has an oddball number of bits per byte, to make things interesting.

Quote:
$lk1, $t0, $t1, $v0 did puzzle me. Are these alias names for certain registers ?
Yes, those are aliases for certain registers. $tx is a temporary, $vx is a return value, $lkx is a link register. $crx is a compare result register, for which there is a separate register file (8 entries). There are four link registers available as a separate register file.
Quote:
I saw that when a stack frame is needed, the entry and exit code is quite large...
Yes, 16 bytes. It's a fixed size to keep the compiler simpler. There's the return address, exception return address and frame pointer to store, plus one extra. I wasn't too worried about the frame size because there's lots of ram and stacks generally aren't very deep.

_________________
Robert Finch http://www.finitron.ca


Mon Dec 30, 2019 1:52 pm
Profile WWW

Joined: Mon Oct 07, 2019 1:26 pm
Posts: 46
Hi all, I had some progress on the C compiler.

As an example, I took the 8-queens problem (as posted in the article https://www.sigbus.info/how-i-wrote-a-self-hosting-c-compiler-in-40-days ).

The program is compiled by the C compiler, that delivers Kobold Assembly. The c compiler has no I/O yet, so in the assemby output I added instructions to write to screen. In the simulator, everything that is written to 0xF000 is going to screen, so I added in the printf function:
- mov 0xF000,A2
- mov D3,(A2)

The resulting Kobold machine code was successfully executed by the simulator on the webpage. The simulator did about 100 instructions per second, and the first solution took about 20 minutes.


Here is the (slightly modified) C program:

Code:
void printf(char c)
{
   
}


void print_board(int board[8][8]) {
  int i,j;
  for (i = 0; i < 8; i++) {
    for (j = 0; j < 8; j++) {
      if (board[i][j])
        printf('Q');
      else
        printf('.');
    }
    printf('\n');
  }
  printf('\n');
  printf('\n');
}

int conflict(int board[8][8], int row, int col) {
  int i;   
  for (i = 0; i < row; i++) {
    if (board[i][col])
      return 1;
    int j;
    j = row - i;
    if (0 < col - j + 1)
      if (board[i][col - j])
        return 1;
    if (col + j < 8)
      if (board[i][col + j])
        return 1;
  }
  return 0;
}

void solve(int board[8][8], int row) {
  if (row == 8) {
    print_board(board);
    return ;
  }
  int i;
  for ( i = 0; i < 8; i++) {
    if (conflict(board, row, i)==0) {
      board[row][i] = 1;
      solve(board, row + 1);
      board[row][i] = 0;
    }
  }
}
int board[8][8];

int main() {
 
  int i,j;
  for (i = 0; i < 8; i++)
      for (j = 0; j < 8; j++)
        board[i][j] = 0;
  solve(board, 0);
}



And here is the resulting Kobold K2 assembly listing (modified so it will give character output):

Code:
            0 ;--- list of types ---
            1 ;  int basic 2
            2 ;  void basic 0
            3 ;  char basic 1
            4 ;  float basic 4
            5 ;  bool basic 1
            6 ;  array-of-chars array 0
            7 ;  -none- basic 2
            8 ;  _tt1 array 16
            9 ;  _tt2 array 128
            10 ;  _tt3 array 16
            11 ;  _tt4 array 128
            12 ;  _tt5 array 16
            13 ;  _tt6 array 128
            14 ;  _tt7 array 16
            15 ;  _tt8 array 128
            16  code section
00000 211Er 17  MOVP 1,WP;  wp page init
00002 611Cs 18  MOV 256,WP;  wp init
00004 3A01  19  MOVP 1,A2;  A2 page init
00006 3B01  20  MOVP 1,A3;  A3 page init
00008 601At 21  JMP main
0000A AAAA  22  data section
0000C AAAA
0000E AAAA
00010 AAAA
00012 AAAA
00014 AAAA
00016 AAAA
00018 AAAA
0001A 01EAt
0001C 0100s
0001E 0001r
            23 Z_0: EQU 32
            24 Z_2: EQU 34
            25 Z_4: EQU 36
            26 Z_6: EQU 38
            27 Z_8: EQU 40
            28 Z_10: EQU 42
            29 Z_12: EQU 44
            30 Z_14: EQU 46
            31 Z_16: EQU 48
            32 Z_18: EQU 50
            33 Z_20: EQU 52
            34 Z_22: EQU 54
            35 Z_24: EQU 56
            36 Z_26: EQU 58
            37 Z_28: EQU 60
            38 Z_30: EQU 62
            39 board: EQU 0x1000
            40 ;--- list of global vars ---
            41 ;  board _tt8 4096
            42  code section
            43
            ;---> Removed lines that contain debugging output of the optimizer
            219
            220 ;*****************************************
            221 ; (1)  function: printf
            222 ;*****************************************
            223 ; This is a leaf function
            224
            225 ;--- function entry ---
            226 printf:
            227 ; MOV D3,(Z_0) // parameter 1 c
            228 ;--- list of local vars ---
            229 ;  c char  (Z_0)
            230 ;--- function body ---
00020 621Eu 231  mov 0xF000,A2
00022 9740  232  mov D3,(A2)
            233 ;--- function exit ---
00024 5800  234  MOV D0,PC; return
            235
            236 ;*****************************************
            237 ; (2)  function: print_board
            238 ;*****************************************
            239
            240 ;--- function entry ---
            241 print_board:
00026 943E  242  MOV D0,(WP+30); save return address
00028 7D20  243  MOV 32,D1;  wp increment
0002A 5120  244  ADD D1,WP;
0002C 9720  245  MOV D3,(WP+0) // parameter 1 board
            246 ;--- list of local vars ---
            247 ;  board _tt2  (WP+0)
            248 ;  i int  (WP+2)
            249 ;  j int  (WP+4)
            250 ;--- function body ---
            251 ; for [[ i = 0] [bool i < 8] [int i = [int
            252 ;     i + 1]]]
            253 ;  [ i = 0]
0002E 7F00  254  MOV 0,D3;  load 0
00030 9722  255  MOV D3,(WP+2); store at i
00032 700E+ 256 L_1:
00034 3F08  257  MOVC 8,D3;  compare: mov_compl
00036 4723 b258  ADDI (WP+2),D3;
00038 A008v 259  BRC L_2
            260 ; for [[ j = 0] [bool j < 8] [int j = [int
            261 ;     j + 1]]]
            262 ;  [ j = 0]
0003A 700C/ 263  MOV 0,D3;  load 0
0003C 008Av
0003E F000u
00040 7F00
00042 9724  264  MOV D3,(WP+4); store at j
00044 7006+ 265 L_3:
00046 3F08  266  MOVC 8,D3;  compare: mov_compl
00048 4725 b267  ADDI (WP+4),D3;
0004A A018g 268  BRC L_4
            269 ; if [bool [int board : [int [int 8 * i] +
            270 ;     j]] != 0]
0004C 6722  271  MOV (WP+2),D3;
            272 ; multiply by  8
0004E 9FFE  273  SHL D3
00050 9FFE  274  SHL D3
00052 9FFE  275  SHL D3
00054 4724  276  ADD (WP+4),D3;
00056 9FFE  277  SHL D3; multiply by 2
00058 4320  278  ADD (WP+0),D3,A3; add local array base
0005A 6760  279  MOV (A3+0),D3;
0005C 701A/b280  ADD 65535,D3; 
0005E 007Ag
00060 471Er
00062 F00C. 281  BRNC L_5
            282 ;  [void printf [81]]
00064 7F51  283  MOV 81,D3;  'Q'  load arg 0
00066 7C6A  284  MOV L_7,D0
00068 7820  285  CALL printf;
            286 L_7:
0006A 7874  287  JMP L_6
            288 L_5:
            289 ;  [void printf [46]]
0006C 7F2E  290  MOV 46,D3;  '.'  load arg 0
0006E 7C72  291  MOV L_8,D0
00070 7820  292  CALL printf;
            293 L_8:
00072 7002+ 294 L_6:
            295 ;  [int j = [int j + 1]]
00074 6725  296  MOVI (WP+4),D3; load int,j,+,1
00076 9724  297  MOV D3,(WP+4); store at j
00078 7846  298  JMP L_3
            299 L_4:
            300 ;  [void printf [10]]
0007A 7F0A  301  MOV 10,D3;   load arg 0
0007C 7006/ 302  MOV L_9,D0
0007E FFFFr
00080 7C84
00082 7820  303  CALL printf;
            304 L_9:
            305 ;  [int i = [int i + 1]]
00084 6723  306  MOVI (WP+2),D3; load int,i,+,1
00086 9722  307  MOV D3,(WP+2); store at i
00088 7834  308  JMP L_1
            309 L_2:
            310 ;  [void printf [10]]
0008A 7F0A  311  MOV 10,D3;   load arg 0
0008C 7C90  312  MOV L_10,D0
0008E 7820  313  CALL printf;
            314 L_10:
            315 ;  [void printf [10]]
00090 7F0A  316  MOV 10,D3;   load arg 0
00092 7C96  317  MOV L_11,D0
00094 7820  318  CALL printf;
            319 L_11:
            320 ;--- function exit ---
00096 3D20  321  MOVC 32,D1;  wp decrement
00098 5121  322  ADDI D1,WP;
0009A 603E  323  MOV (WP+30),PC; return
            324
            325 ;*****************************************
            326 ; (3)  function: conflict
            327 ;*****************************************
            328 ; This is a leaf function
            329
            330 ;--- function entry ---
            331 conflict:
0009C 9F20  332  MOV D3,(Z_0) // parameter 1 board
0009E 7004/ 333  MOV D2,(Z_2) // parameter 2 row
000A0 9E22
000A2 7760  334  MOV A3,D3
000A4 9F24  335  MOV D3,(Z_4) // parameter 3 col
            336 ;--- list of local vars ---
            337 ;  board _tt4  (Z_0)
            338 ;  row int  (Z_2)
            339 ;  col int  (Z_4)
            340 ;  i int  (Z_6)
            341 ;  j int  (Z_8)
            342 ;--- function body ---
            343 ; for [[ i = 0] [bool i < row] [int i =
            344 ;     [int i + 1]]]
            345 ;  [ i = 0]
000A6 7F00  346  MOV 0,D3;  load 0
000A8 9F26  347  MOV D3,(Z_6); store at i
000AA 700C+ 348 L_12:
000AC 2F22  349  MOVC (Z_2),D3; compare: mov_compl
000AE 4F27 b350  ADDI (Z_6),D3;
000B0 A012r 351  BRC L_13
            352 ; if [bool [int board : [int [int 8 * i] +
            353 ;     col]] != 0]
000B2 6F26  354  MOV (Z_6),D3;
            355 ; multiply by  8
000B4 9FFE  356  SHL D3
000B6 9FFE  357  SHL D3
000B8 9FFE  358  SHL D3
000BA 4F24  359  ADD (Z_4),D3;
000BC 7014/ 360  SHL D3; multiply by 2
000BE 0150r
000C0 9FFE
000C2 4B20  361  ADD (Z_0),D3,A3; add local array base
000C4 6760  362  MOV (A3+0),D3;
000C6 471Egb363  ADD 65535,D3; 
000C8 E01Ch 364  BRNC L_14
            365 ;  {return:1}
000CA 7F01  366  MOV 1,D3;  return value
000CC 5800  367  MOV D0,PC; return
            368 L_14:
            369 ;  [ j = [int row - i]]
000CE 2F26  370  MOVC (Z_6),D3; load int,row,-,i
000D0 4F23  371  ADDI (Z_2),D3;
000D2 9F28  372  MOV D3,(Z_8); store at j
            373 ; if [bool 0 < [bool [bool col - j] + 1]]
000D4 2F28  374  MOVC (Z_8),D3; compare: mov_compl
000D6 4F25  375  ADDI (Z_4),D3;
000D8 5F01  376  ADD 1,D3; 
000DA 7012/ 377  COM D3;  complement
000DC 00CEh
000DE FFFFg
000E0 1F00
000E2 5F01 b378  ADD 1,D3; +1
000E4 A01Er 379  BRC L_15
            380 ; if [bool [int board : [int [int 8 * i] +
            381 ;     [int col - j]]] != 0]
000E6 6F26  382  MOV (Z_6),D3;
            383 ; multiply by  8
000E8 9FFE  384  SHL D3
000EA 9FFE  385  SHL D3
000EC 9FFE  386  SHL D3
000EE 9F3C  387  MOV D3,(Z_28);
000F0 2F28  388  MOVC (Z_8),D3;
000F2 4F25  389  ADDI (Z_4),D3;
000F4 4F3C  390  ADD (Z_28),D3;
000F6 9FFE  391  SHL D3; multiply by 2
000F8 4B20  392  ADD (Z_0),D3,A3; add local array base
000FA 601C! 393  MOV (A3+0),D3;
000FC 0100!
000FE 010Cr
00100 6760
00102 471Egb394  ADD 65535,D3; 
00104 E01Ch 395  BRNC L_16
            396 ;  {return:1}
00106 7F01  397  MOV 1,D3;  return value
00108 5800  398  MOV D0,PC; return
            399 L_16:
0010A 7002+ 400 L_15:
            401 ; if [bool [bool col + j] < 8]
0010C 6F28  402  MOV (Z_8),D3; compare using stack
0010E 4F24  403  ADD (Z_4),D3;
00110 9F3C  404  MOV D3,(Z_28);
00112 3F08  405  MOVC 8,D3;  compare: mov_compl
00114 4F3D b406  ADDI (Z_28),D3; compare from stack
00116 A00Ei 407  BRC L_17
            408 ; if [bool [int board : [int [int 8 * i] +
            409 ;     [int col + j]]] != 0]
00118 7014/ 410  MOV (Z_6),D3;
0011A 014Ai
0011C 010Ah
0011E FFFFg
00120 6F26
            411 ; multiply by  8
00122 9FFE  412  SHL D3
00124 9FFE  413  SHL D3
00126 9FFE  414  SHL D3
00128 9F3C  415  MOV D3,(Z_28);
0012A 6F28  416  MOV (Z_8),D3;
0012C 4F24  417  ADD (Z_4),D3;
0012E 4F3C  418  ADD (Z_28),D3;
00130 9FFE  419  SHL D3; multiply by 2
00132 4B20  420  ADD (Z_0),D3,A3; add local array base
00134 6760  421  MOV (A3+0),D3;
00136 601E!b422  ADD 65535,D3; 
00138 ????
0013A ????  423  BRNC L_18
0013C ????  424 ;  {return:1}
0013E 0140! 425  MOV 1,D3;  return value
00140 471Eg
00142 E01Ch
00144 7F01
00146 5800  426  MOV D0,PC; return
            427 L_18:
00148 7002+ 428 L_17:
            429 ;  [int i = [int i + 1]]
0014A 6F27  430  MOVI (Z_6),D3; load int,i,+,1
0014C 9F26  431  MOV D3,(Z_6); store at i
0014E 78AC  432  JMP L_12
            433 L_13:
            434 ;  {return:0}
00150 7F00  435  MOV 0,D3;  return value
00152 5800  436  MOV D0,PC; return
            437 ;--- function exit ---
            438 ; Exit was handled by return statement
            439
            440 ;*****************************************
            441 ; (4)  function: solve
            442 ;*****************************************
            443
            444 ;--- function entry ---
            445 solve:
00154 943E  446  MOV D0,(WP+30); save return address
00156 7D20  447  MOV 32,D1;  wp increment
00158 5120  448  ADD D1,WP;
0015A 700C/ 449  MOV D3,(WP+0) // parameter 1 board
0015C 0148h
0015E FFFFg
00160 9720
00162 9622  450  MOV D2,(WP+2) // parameter 2 row
            451 ;--- list of local vars ---
            452 ;  board _tt6  (WP+0)
            453 ;  row int  (WP+2)
            454 ;  i int  (WP+4)
            455 ;--- function body ---
            456 ; if [bool row == 8]
00164 2722  457  MOVC (WP+2),D3; compare: mov_complement
00166 5F09  458  ADD 9,D3; +1
00168 471Erb459  ADD 65535,D3; 
0016A A01Cs 460  BRC L_19
            461 ;  [void print_board [board]]
0016C 6720  462  MOV (WP+0),D3;  load arg 0
0016E 641At 463  MOV L_20,D0
00170 7826  464  CALL print_board;
            465 L_20:
            466 ;  {return:0}
00172 3D20  467  MOVC 32,D1;  wp decrement
00174 5121  468  ADDI D1,WP;
00176 603E  469  MOV (WP+30),PC; return
            470 L_19:
            471 ; for [[ i = 0] [bool i < 8] [int i = [int
            472 ;     i + 1]]]
            473 ;  [ i = 0]
00178 7008/ 474  MOV 0,D3;  load 0
0017A 0172t
0017C 0178s
0017E FFFFr
00180 7F00
00182 9724  475  MOV D3,(WP+4); store at i
00184 7006+ 476 L_21:
00186 3F08  477  MOVC 8,D3;  compare: mov_compl
00188 4725 b478  ADDI (WP+4),D3;
0018A A018g 479  BRC L_22
            480 ; if [bool [int conflict [board row i]] ==
            481 ;     0]
0018C 6324  482  MOV (WP+4),A3;  load arg 2
0018E 6622  483  MOV (WP+2),D2;  load arg 1
00190 6720  484  MOV (WP+0),D3;  load arg 0
00192 6416h 485  MOV L_24,D0
00194 789C  486  CALL conflict;
            487 L_24:
00196 700A/b488  ADD 65535,D3; 
00198 ????
0019A ????  489  BRC L_23
0019C 0196h 490 ;  [ [int board : [int [int 8 * row] + i]]
0019E 01E4g
001A0 471Er
001A2 A01Cs
            491 ;     = 1]
001A4 7F01  492  MOV 1,D3;  load 1
001A6 6622  493  MOV (WP+2),D2;
            494 ; multiply by  8
001A8 9EFE  495  SHL D2
001AA 9EFE  496  SHL D2
001AC 9EFE  497  SHL D2
001AE 4624  498  ADD (WP+4),D2;
001B0 9EFE  499  SHL D2; multiply by 2
001B2 4220  500  ADD (WP+0),D2,A2; add local array base
001B4 9740  501  MOV D3,(A2+0); store at int,board,:,int,int,8,*,row,+,i
            502 ;  [void solve [board [int row + 1]]]
001B6 6623  503  MOVI (WP+2),D2;  calc arg 1
001B8 601A! 504  MOV (WP+0),D3;  load arg 0
001BA 01C0!
001BC 01DAs
001BE FFFFr
001C0 6720
001C2 7406. 505  MOV L_25,D0
001C4 601Eg 506  CALL solve;
            507 L_25:
            508 ;  [ [int board : [int [int 8 * row] + i]]
            509 ;     = 0]
001C6 7F00  510  MOV 0,D3;  load 0
001C8 6622  511  MOV (WP+2),D2;
            512 ; multiply by  8
001CA 9EFE  513  SHL D2
001CC 9EFE  514  SHL D2
001CE 9EFE  515  SHL D2
001D0 4624  516  ADD (WP+4),D2;
001D2 9EFE  517  SHL D2; multiply by 2
001D4 4220  518  ADD (WP+0),D2,A2; add local array base
001D6 9740  519  MOV D3,(A2+0); store at int,board,:,int,int,8,*,row,+,i
001D8 7014+ 520 L_23:
            521 ;  [int i = [int i + 1]]
001DA 6725  522  MOVI (WP+4),D3; load int,i,+,1
001DC 7006/ 523  MOV D3,(WP+4); store at i
001DE 0154g
001E0 9724
001E2 601Er 524  JMP L_21
            525 L_22:
            526 ;--- function exit ---
001E4 3D20  527  MOVC 32,D1;  wp decrement
001E6 5121  528  ADDI D1,WP;
001E8 603E  529  MOV (WP+30),PC; return
            530
            531 ;*****************************************
            532 ; (5)  function: main
            533 ;*****************************************
            534
            535 ;--- function entry ---
            536 main:
001EA 943E  537  MOV D0,(WP+30); save return address
001EC 7D20  538  MOV 32,D1;  wp increment
001EE 5120  539  ADD D1,WP;
            540 ;--- list of local vars ---
            541 ;  i int  (WP+0)
            542 ;  j int  (WP+2)
            543 ;--- function body ---
            544 ; for [[ i = 0] [bool i < 8] [int i = [int
            545 ;     i + 1]]]
            546 ;  [ i = 0]
001F0 7F00  547  MOV 0,D3;  load 0
001F2 9720  548  MOV D3,(WP+0); store at i
001F4 700C+ 549 L_26:
001F6 3F08  550  MOVC 8,D3;  compare: mov_compl
001F8 700A/b551  ADDI (WP+0),D3;
001FA ????
001FC ????  552  BRC L_27
001FE 0186r 553 ; for [[ j = 0] [bool j < 8] [int j = [int
00200 4721
00202 A01Eg
            554 ;     j + 1]]]
            555 ;  [ j = 0]
00204 7F00  556  MOV 0,D3;  load 0
00206 9722  557  MOV D3,(WP+2); store at j
00208 700A+ 558 L_28:
0020A 3F08  559  MOVC 8,D3;  compare: mov_compl
0020C 4723 b560  ADDI (WP+2),D3;
0020E A012h 561  BRC L_29
            562 ;  [ [int board : [int [int 8 * i] + j]]
            563 ;     = 0]
00210 7F00  564  MOV 0,D3;  load 0
00212 6620  565  MOV (WP+0),D2;
            566 ; multiply by  8
00214 9EFE  567  SHL D2
00216 9EFE  568  SHL D2
00218 9EFE  569  SHL D2
0021A 7016/ 570  ADD (WP+2),D2;
0021C 022Eh
0021E 0234g
00220 4622
00222 9EFE  571  SHL D2; multiply by 2
00224 421Er 572  ADD board,D2,A2; add global array base
00226 9740  573  MOV D3,(A2+0); store at int,board,:,int,int,8,*,i,+,j
            574 ;  [int j = [int j + 1]]
00228 6723  575  MOVI (WP+2),D3; load int,j,+,1
0022A 9722  576  MOV D3,(WP+2); store at j
0022C 601Cs 577  JMP L_28
            578 L_29:
            579 ;  [int i = [int i + 1]]
0022E 6721  580  MOVI (WP+0),D3; load int,i,+,1
00230 9720  581  MOV D3,(WP+0); store at i
00232 600Ct 582  JMP L_26
            583 L_27:
            584 ;  [void solve [board 0]]
00234 7E00  585  MOV 0,D2;   load arg 1
00236 700C/ 586  MOV board,D3; load address
00238 ????
0023A 01F6t 587  MOV L_30,D0
0023C 020As
0023E 1000r
00240 671Eg
00242 7406.
00244 601Ch 588  CALL solve;
            589 L_30:
            590 ;--- function exit ---
00246 3D20  591  MOVC 32,D1;  wp decrement
00248 5121  592  ADDI D1,WP;
0024A 603E  593  MOV (WP+30),PC; return
            594   
0024C ????  595
0024E ????  595
00250 ????  595
00252 ????  595
00254 ????  595
00256 ????  595
00258 ????  595
0025A ????  595
0025C 0154h 595
0025E 1000g 595


Just a few notes about the assembler output:

The output is grouped in sections of 0x0020 bytes. 16-bit immediates are not directly after their instruction, but grouped together at the end of a section.
In order to make clear which immediates belong to which instruction, they have a lower-case character appended to the instruction word (on the listing only, not in binary output). So:

000E4 A01Er 380 BRC L_15 ; address 000E4, instruction A01E, line 380, conditional branch to label L_15. The 'r' after A01E indicates where the immediate is found.

000FE 010Cr ; address 000FE, contents 010C, has code 'r'. So the value of L_15 is 010C, and that is where the CPU will jump to if there was a carry.

At the end of a section, the assembler generates an instruction to jump over the 16-bit immediates to the next section.
A "/","+" or "!" means that this is a jump instruction that is inserted by the assembler, in most cases to jump to the next section of code. In the case of '!' it is a jump with 16-bit immediate, that is located at the '!' that follows it.
A '.' means that it is a absolute branch or jump that was replaced by a relative jump.


The compiler does several optimizations, but most of them could not be used in this program. But you can see, that it does NOT use the WP (workspace) for local variables when the function is a leaf function (calls no other functions). In that case, the locals are in Z-page at addr 0x0020 and upwards. This saves incrementing and decrementing the WP in a leaf function.


Next thing to do, is to run it on the real thing.


Sun Feb 09, 2020 1:36 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Quote:
As an example, I took the 8-queens problem (as posted in the article https://www.sigbus.info/how-i-wrote-a-s ... in-40-days ).

Hi Roelh, That's a great achievement.

I couldn't resist to input the same C code to my CPU74 compiler and run it on the simulator. This is the resulting compiled code:

CPU74
Code:
   .text
   .file   "queens.c"

# ---------------------------------------------
# print_board
# ---------------------------------------------
   .globl   print_board
print_board:
   add   SP, -6, SP
   st.w   r4, [SP, 4]
   st.w   r5, [SP, 2]
   st.w   r6, [SP, 0]
   mov   0, r1
   mov   81, r2
   mov   46, r3
   mov   10, r4
.LBB1_1:
   mov   0, r5
.LBB1_2:
   ld.w   [r0, r5], r6
   cmp.eq   r6, 0
   selcc   r3, r2, r6
   st.b   r6, [-1L]
   lea   r5, 2, r5
   cmp.eq   r5, 16L
   brncc   .LBB1_2
   st.b   r4, [-1L]
   lea   r0, 16, r0
   lea   r1, 1, r1
   cmp.eq   r1, 8
   brncc   .LBB1_1
   mov   10, r0
   st.b   r0, [-1L]
   ld.w   [SP, 0], r6
   ld.w   [SP, 2], r5
   ld.w   [SP, 4], r4
   add   SP, 6, SP
   ret

# ---------------------------------------------
# conflict
# ---------------------------------------------
   .globl   conflict
conflict:
   add   SP, -8, SP
   st.w   r4, [SP, 6]
   st.w   r5, [SP, 4]
   st.w   r6, [SP, 2]
   st.w   r7, [SP, 0]
   cmp.lt   r1, 1
   brcc   .LBB2_8
   sub   r2, r1, r3
   add   r3, r3, r4
   add   r2, r2, r3
   add   r0, r3, r3
   add   r0, r4, r4
   add   r2, r1, r5
   add   r5, r5, r6
   add   r0, r6, r6
   neg   r1, r1
.LBB2_2:
   ld.w   [r3, 0], r0
   cmp.eq   r0, 0
   mov   1, r0
   brncc   .LBB2_9
   add   r2, r1, r7
   cmp.lt   r7, 0
   brcc   .LBB2_5
   ld.w   [r4, 0], r7
   cmp.eq   r7, 0
   brncc   .LBB2_9
.LBB2_5:
   cmp.gt   r5, 7
   brcc   .LBB2_7
   ld.w   [r6, 0], r7
   cmp.eq   r7, 0
   brncc   .LBB2_9
.LBB2_7:
   lea   r3, 16, r3
   sub   r5, 1, r5
   lea   r4, 18, r4
   lea   r6, 14, r6
   add   r1, 1, r1
   brncc   .LBB2_2
.LBB2_8:
   mov   0, r0
.LBB2_9:
   ld.w   [SP, 0], r7
   ld.w   [SP, 2], r6
   ld.w   [SP, 4], r5
   ld.w   [SP, 6], r4
   add   SP, 8, SP
   ret

# ---------------------------------------------
# solve
# ---------------------------------------------
   .globl   solve
solve:
   add   SP, -12, SP
   st.w   r4, [SP, 10]
   st.w   r5, [SP, 8]
   st.w   r6, [SP, 6]
   st.w   r7, [SP, 4]
   mov   r1, r5
   mov   r0, r4
   cmp.eq   r5, 8
   brncc   .LBB3_6
   mov   0, r0
   mov   81, r1
   mov   46, r2
   mov   10, r3
.LBB3_2:
   mov   0, r5
.LBB3_3:
   ld.w   [r4, r5], r6
   cmp.eq   r6, 0
   selcc   r2, r1, r6
   st.b   r6, [-1L]
   lea   r5, 2, r5
   cmp.eq   r5, 16L
   brncc   .LBB3_3
   st.b   r3, [-1L]
   lea   r4, 16, r4
   lea   r0, 1, r0
   cmp.eq   r0, 8
   brncc   .LBB3_2
   mov   10, r0
   st.b   r0, [-1L]
   jmp   .LBB3_10
.LBB3_6:
   lea   r5, 1, r0
   st.w   r0, [SP, 0]
   add   r5, r5, r0
   add   r0, r0, r0
   add   r0, r0, r0
   add   r0, r0, r6
   mov   0, r7
   st.w   r5, [SP, 2]
.LBB3_7:
   mov   r4, r0
   mov   r5, r1
   mov   r7, r2
   call   @conflict
   cmp.eq   r0, 0
   brncc   .LBB3_9
   add   r4, r6, r5
   mov   1, r0
   st.w   r0, [r5, 0]
   mov   r4, r0
   ld.w   [SP, 0], r1
   call   @solve
   mov   0, r0
   st.w   r0, [r5, 0]
   ld.w   [SP, 2], r5
.LBB3_9:
   lea   r6, 2, r6
   lea   r7, 1, r7
   cmp.eq   r7, 8
   brncc   .LBB3_7
.LBB3_10:
   ld.w   [SP, 4], r7
   ld.w   [SP, 6], r6
   ld.w   [SP, 8], r5
   ld.w   [SP, 10], r4
   add   SP, 12, SP
   ret

# ---------------------------------------------
# main
# ---------------------------------------------
   .globl   main
main:
   add   SP, -2, SP
   st.w   r4, [SP, 0]
   mov   &board, r4
   mov   128L, r2
   mov   r4, r0
   mov   0, r1
   call   @memset
   mov   r4, r0
   mov   0, r1
   call   @solve
   mov   0, r0
   ld.w   [SP, 0], r4
   add   SP, 2, SP
   ret

# ---------------------------------------------
# Global Data
# ---------------------------------------------
   .comm   board,128,2


The above is compiled with the -Os compiler flag, which is a balance between speed and size. It can be gotten slightly shorter although somewhat slower with the -Oz flag. In particular the "solve" function gets shortened by about 20% with the -Oz flag. I do not even try the aggressiveness of -O2 or -O3 anymore because that tends to unroll everything (starting by loops) and to produce extremely long assembly outputs. For non-pipelined architectures these compiler optimisation options don't really make sense. So that's why I stick to just -Os or -Oz for the CPU74 backend.

The simulator produces the following output for the execution of the 8-queens code above:

Code:
/Users/joan/Documents-Local/Relay/CPU74/Simulator/DerivedData/Simulator/Build/Products/Release/c74-sim
Q.......
....Q...
.......Q
.....Q..
..Q.....
......Q.
.Q......
...Q....

Q.......
.....Q..
.......Q
..Q.....
......Q.
...Q....
.Q......
....Q...

Q.......
......Q.
...Q....
.....Q..
.......Q
.Q......
....Q...
..Q.....

Q.......
......Q.
....Q...
.......Q
.Q......
...Q....
.....Q..
..Q.....

.Q......
...Q....
.....Q..
.......Q
..Q.....
Q.......
......Q.
....Q...

.Q......
....Q...
......Q.
Q.......
..Q.....
.......Q
.....Q..
...Q....

.Q......
....Q...
......Q.
...Q....
Q.......
.......Q
.....Q..
..Q.....

.Q......
.....Q..
Q.......
......Q.
...Q....
.......Q
..Q.....
....Q...

.Q......
.....Q..
.......Q
..Q.....
Q.......
...Q....
......Q.
....Q...

.Q......
......Q.
..Q.....
.....Q..
.......Q
....Q...
Q.......
...Q....

.Q......
......Q.
....Q...
.......Q
Q.......
...Q....
.....Q..
..Q.....

.Q......
.......Q
.....Q..
Q.......
..Q.....
....Q...
......Q.
...Q....

..Q.....
Q.......
......Q.
....Q...
.......Q
.Q......
...Q....
.....Q..

..Q.....
....Q...
.Q......
.......Q
Q.......
......Q.
...Q....
.....Q..

..Q.....
....Q...
.Q......
.......Q
.....Q..
...Q....
......Q.
Q.......

..Q.....
....Q...
......Q.
Q.......
...Q....
.Q......
.......Q
.....Q..

..Q.....
....Q...
.......Q
...Q....
Q.......
......Q.
.Q......
.....Q..

..Q.....
.....Q..
.Q......
....Q...
.......Q
Q.......
......Q.
...Q....

..Q.....
.....Q..
.Q......
......Q.
Q.......
...Q....
.......Q
....Q...

..Q.....
.....Q..
.Q......
......Q.
....Q...
Q.......
.......Q
...Q....

..Q.....
.....Q..
...Q....
Q.......
.......Q
....Q...
......Q.
.Q......

..Q.....
.....Q..
...Q....
.Q......
.......Q
....Q...
......Q.
Q.......

..Q.....
.....Q..
.......Q
Q.......
...Q....
......Q.
....Q...
.Q......

..Q.....
.....Q..
.......Q
Q.......
....Q...
......Q.
.Q......
...Q....

..Q.....
.....Q..
.......Q
.Q......
...Q....
Q.......
......Q.
....Q...

..Q.....
......Q.
.Q......
.......Q
....Q...
Q.......
...Q....
.....Q..

..Q.....
......Q.
.Q......
.......Q
.....Q..
...Q....
Q.......
....Q...

..Q.....
.......Q
...Q....
......Q.
Q.......
.....Q..
.Q......
....Q...

...Q....
Q.......
....Q...
.......Q
.Q......
......Q.
..Q.....
.....Q..

...Q....
Q.......
....Q...
.......Q
.....Q..
..Q.....
......Q.
.Q......

...Q....
.Q......
....Q...
.......Q
.....Q..
Q.......
..Q.....
......Q.

...Q....
.Q......
......Q.
..Q.....
.....Q..
.......Q
Q.......
....Q...

...Q....
.Q......
......Q.
..Q.....
.....Q..
.......Q
....Q...
Q.......

...Q....
.Q......
......Q.
....Q...
Q.......
.......Q
.....Q..
..Q.....

...Q....
.Q......
.......Q
....Q...
......Q.
Q.......
..Q.....
.....Q..

...Q....
.Q......
.......Q
.....Q..
Q.......
..Q.....
....Q...
......Q.

...Q....
.....Q..
Q.......
....Q...
.Q......
.......Q
..Q.....
......Q.

...Q....
.....Q..
.......Q
.Q......
......Q.
Q.......
..Q.....
....Q...

...Q....
.....Q..
.......Q
..Q.....
Q.......
......Q.
....Q...
.Q......

...Q....
......Q.
Q.......
.......Q
....Q...
.Q......
.....Q..
..Q.....

...Q....
......Q.
..Q.....
.......Q
.Q......
....Q...
Q.......
.....Q..

...Q....
......Q.
....Q...
.Q......
.....Q..
Q.......
..Q.....
.......Q

...Q....
......Q.
....Q...
..Q.....
Q.......
.....Q..
.......Q
.Q......

...Q....
.......Q
Q.......
..Q.....
.....Q..
.Q......
......Q.
....Q...

...Q....
.......Q
Q.......
....Q...
......Q.
.Q......
.....Q..
..Q.....

...Q....
.......Q
....Q...
..Q.....
Q.......
......Q.
.Q......
.....Q..

....Q...
Q.......
...Q....
.....Q..
.......Q
.Q......
......Q.
..Q.....

....Q...
Q.......
.......Q
...Q....
.Q......
......Q.
..Q.....
.....Q..

....Q...
Q.......
.......Q
.....Q..
..Q.....
......Q.
.Q......
...Q....

....Q...
.Q......
...Q....
.....Q..
.......Q
..Q.....
Q.......
......Q.

....Q...
.Q......
...Q....
......Q.
..Q.....
.......Q
.....Q..
Q.......

....Q...
.Q......
.....Q..
Q.......
......Q.
...Q....
.......Q
..Q.....

....Q...
.Q......
.......Q
Q.......
...Q....
......Q.
..Q.....
.....Q..

....Q...
..Q.....
Q.......
.....Q..
.......Q
.Q......
...Q....
......Q.

....Q...
..Q.....
Q.......
......Q.
.Q......
.......Q
.....Q..
...Q....

....Q...
..Q.....
.......Q
...Q....
......Q.
Q.......
.....Q..
.Q......

....Q...
......Q.
Q.......
..Q.....
.......Q
.....Q..
...Q....
.Q......

....Q...
......Q.
Q.......
...Q....
.Q......
.......Q
.....Q..
..Q.....

....Q...
......Q.
.Q......
...Q....
.......Q
Q.......
..Q.....
.....Q..

....Q...
......Q.
.Q......
.....Q..
..Q.....
Q.......
...Q....
.......Q

....Q...
......Q.
.Q......
.....Q..
..Q.....
Q.......
.......Q
...Q....

....Q...
......Q.
...Q....
Q.......
..Q.....
.......Q
.....Q..
.Q......

....Q...
.......Q
...Q....
Q.......
..Q.....
.....Q..
.Q......
......Q.

....Q...
.......Q
...Q....
Q.......
......Q.
.Q......
.....Q..
..Q.....

.....Q..
Q.......
....Q...
.Q......
.......Q
..Q.....
......Q.
...Q....

.....Q..
.Q......
......Q.
Q.......
..Q.....
....Q...
.......Q
...Q....

.....Q..
.Q......
......Q.
Q.......
...Q....
.......Q
....Q...
..Q.....

.....Q..
..Q.....
Q.......
......Q.
....Q...
.......Q
.Q......
...Q....

.....Q..
..Q.....
Q.......
.......Q
...Q....
.Q......
......Q.
....Q...

.....Q..
..Q.....
Q.......
.......Q
....Q...
.Q......
...Q....
......Q.

.....Q..
..Q.....
....Q...
......Q.
Q.......
...Q....
.Q......
.......Q

.....Q..
..Q.....
....Q...
.......Q
Q.......
...Q....
.Q......
......Q.

.....Q..
..Q.....
......Q.
.Q......
...Q....
.......Q
Q.......
....Q...

.....Q..
..Q.....
......Q.
.Q......
.......Q
....Q...
Q.......
...Q....

.....Q..
..Q.....
......Q.
...Q....
Q.......
.......Q
.Q......
....Q...

.....Q..
...Q....
Q.......
....Q...
.......Q
.Q......
......Q.
..Q.....

.....Q..
...Q....
.Q......
.......Q
....Q...
......Q.
Q.......
..Q.....

.....Q..
...Q....
......Q.
Q.......
..Q.....
....Q...
.Q......
.......Q

.....Q..
...Q....
......Q.
Q.......
.......Q
.Q......
....Q...
..Q.....

.....Q..
.......Q
.Q......
...Q....
Q.......
......Q.
....Q...
..Q.....

......Q.
Q.......
..Q.....
.......Q
.....Q..
...Q....
.Q......
....Q...

......Q.
.Q......
...Q....
Q.......
.......Q
....Q...
..Q.....
.....Q..

......Q.
.Q......
.....Q..
..Q.....
Q.......
...Q....
.......Q
....Q...

......Q.
..Q.....
Q.......
.....Q..
.......Q
....Q...
.Q......
...Q....

......Q.
..Q.....
.......Q
.Q......
....Q...
Q.......
.....Q..
...Q....

......Q.
...Q....
.Q......
....Q...
.......Q
Q.......
..Q.....
.....Q..

......Q.
...Q....
.Q......
.......Q
.....Q..
Q.......
..Q.....
....Q...

......Q.
....Q...
..Q.....
Q.......
.....Q..
.......Q
.Q......
...Q....

.......Q
.Q......
...Q....
Q.......
......Q.
....Q...
..Q.....
.....Q..

.......Q
.Q......
....Q...
..Q.....
Q.......
......Q.
...Q....
.....Q..

.......Q
..Q.....
Q.......
.....Q..
.Q......
....Q...
......Q.
...Q....

.......Q
...Q....
Q.......
..Q.....
.....Q..
.Q......
......Q.
....Q...

Program ended with exit code: 0


I suppose these are all the possible solutions to the 8-queens problem. I have not setup a timer to determine how long the simulation exactly takes on my 12 years old computer, but it's about 2.5 seconds for the complete set of solutions given above. This is in spite that the simulator is not made to be fast, but to simulate all individual cpu control signals and execute microinstructions based on them. My 2.5 second for the entire set of solutions, is in huge contrast with your stated 20 minutes required only for the first solution. That's obviously explained by using a compiled language (Apple Swift in my case) instead of a web-based or interpreted one (python?, javascript?). Although I'm surprised that the difference is SO huge.

Code:
Next thing to do, is to run it on the real thing.
That's great too, and something that is still very far from possible in my case.

Joan

[EDIT: re-posted the code generated by the compiler after fixing a subtle optimisation bug]


Last edited by joanlluch on Mon Feb 10, 2020 1:16 pm, edited 1 time in total.



Sun Feb 09, 2020 3:36 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(Might be interesting to know, in each case, how many instructions or clock cycles were needed - if it were cheap to find out.)


Sun Feb 09, 2020 6:55 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
BigEd wrote:
(Might be interesting to know, in each case, how many instructions or clock cycles were needed - if it were cheap to find out.)

That's quite interesting. I added some code to accumulate the number of instructions and cycles (the simulator is cycle accurate) as well as the computation of the elapsed time and these are the results

For compiler option -Oz

Code:
Executed instruction count: 1473350
Total cycle count: 1980095
Elapsed time: 2.70678 seconds


For compiler option -Os

Code:
Executed instruction count: 1321973
Total cycle count: 1762834
Elapsed time: 2.46086 seconds


(note that elapsed time may have some variations because the computer is of course sharing time with other running processes)

So the simulator appears to run at an equivalent of 1762834 / 2.46086 = 0.72 MHz.

I want to state again that the cpu74 simulator is not optimised or designed for speed, but to keep track of all control signals as they will be produced in the processor. I'm pretty sure that if the goal was fast cpu emulation alone, it could run more than 10x or 20x faster, approaching or surpassing the expected clock frequency of the real thing. (But of course, this is possible because the simulator runs on a 64 bit machine (intel) at 3 GHz)

[EDIT: I found a subtle compiler optimisation bug that in rare circumstances prevented an -Os optimisation to apply. In this case, it affected the 'confilct' function of the queens program. So I am editing the compiler assembly output I posted on the previous post, and updated the -Os figures in this post]


Sun Feb 09, 2020 9:39 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I figured I’d add the CC64’s compiler output to the list. The following is untested but looks relatively good. The target is a RISCV compatible. I just spotted a couple of errors in the code, but things should be good for a chuckle. Yes, this is optimized code, unoptimized is much larger.
Code:
   code
   align   16
;====================================================
; Basic Block 0
;====================================================
public code _printf:
queens_8:
            ret     
endpublic



   data
   align   8
   code
   align   16
;====================================================
; Basic Block 0
;====================================================
public code _print_board:
            sub         $sp,$sp,#32
            sto         $fp,[$sp]
            sto         $x0,8[$sp]
            sto         $ra,24[$sp]
            mov         $fp,$sp
            sub         $sp,$sp,#54
            sto         $s0,0[$sp]
            sto         $s1,8[$sp]
            ldo         $s0,-18[$fp]
;   for (i = 0; i < 8; i++) {
            mov         $s1,$x0
            slt         $t0,$s1,#8
            beqz        $t0,queens_26
queens_25:
;     for (j = 0; j < 8; j++) {
            mov         $s0,$x0
            slt         $t0,$s0,#8
            beqz        $t0,queens_29
queens_28:
;       if (board[i][j])
            mov         $t1,$s0
            ldi         $t2,#3
            sll         $t0,$t1,$t2
            mov         $t3,$s1
            ldi         $t4,#6
            sll         $t2,$t3,$t4
            mov         $t3,$t2
            lea         $t4,32[$fp]
            sto         $t0,-38[$fp]
            mov         $t0,$t4
            add         $t1,$t3,$t0
            ldo         $t0,-38[$fp]
            sto         $t4,-38[$fp]
            add         $t4,$t1,$t0
            beq         $t4,$x0,queens_31
;         printf('Q');
            sub         $sp,$sp,#8
            ldi         $t0,#81
            sto         $t0,0[$sp]
            call        _printf
            add         $sp,$sp,#8
            bra         queens_32
queens_31:
;         printf('.');
            sub         $sp,$sp,#8
            ldi         $t0,#46
            sto         $t0,0[$sp]
            call        _printf
            add         $sp,$sp,#8
queens_32:
            add         $s0,$s0,#1
            slt         $t0,$s0,#8
            bnez        $t0,queens_28
queens_29:
;     printf('\n');
            sub         $sp,$sp,#8
            ldi         $t0,#10
            sto         $t0,0[$sp]
            call        _printf
            add         $sp,$sp,#8
            add         $s1,$s1,#1
            slt         $t0,$s1,#8
            bnez        $t0,queens_25
queens_26:
;   printf('\n');
            sub         $sp,$sp,#8
            ldi         $t0,#10
            sto         $t0,0[$sp]
            call        _printf
            add         $sp,$sp,#8
            sub         $sp,$sp,#8
            sto         $t0,0[$sp]
            call        _printf
            add         $sp,$sp,#8
queens_21:
queens_24:
            ldo         $s0,0[$sp]
            ldo         $s1,8[$sp]
            mov         $sp,$fp
            ldo         $fp,[$sp]
            ldo         $ra,24[$sp]
            add         $sp,$sp,#32
            ret     
endpublic



   data
   align   8
   code
   align   16
;====================================================
; Basic Block 0
;====================================================
public code _conflict:
            sub         $sp,$sp,#32
            sto         $fp,[$sp]
            sto         $x0,8[$sp]
            mov         $fp,$sp
            sub         $sp,$sp,#78
            sto         $s0,0[$sp]
            sto         $s1,8[$sp]
            sto         $s2,16[$sp]
            sto         $s3,24[$sp]
            sto         $s4,32[$sp]
            ldo         $s0,-8[$fp]
            ldo         $s1,552[$fp]
            ldo         $s2,-18[$fp]
            ldo         $s3,562[$fp]
            lea         $t0,32[$fp]
            mov         $s4,$t0
;   for (i = 0; i < row; i++) {
            mov         $s0,$x0
            bge         $s0,$s1,queens_55
queens_54:
;     if (board[i][col])
            mov         $t1,$s3
            ldi         $t2,#3
            sll         $t0,$t1,$t2
            mov         $t3,$s0
            ldi         $t4,#6
            sll         $t2,$t3,$t4
            mov         $t3,$t2
            mov         $t4,$s4
            add         $t1,$t3,$t4
            add         $t3,$t1,$t0
            beq         $t3,$x0,queens_57
;       return 1;
            ldi         $v0,#1
queens_53:
            ldo         $s0,0[$sp]
            ldo         $s1,8[$sp]
            ldo         $s2,16[$sp]
            ldo         $s3,24[$sp]
            ldo         $s4,32[$sp]
            mov         $sp,$fp
            ldo         $fp,[$sp]
            add         $sp,$sp,#32
            ret     
queens_57:
;     j = row - i;
            mov         $t1,$s1
            mov         $t2,$s0
            sub         $t0,$t1,$t2
            mov         $s2,$t0
;     if (0 < col - j + 1)
            ldi         $t0,#0
            mov         $t3,$s3
            mov         $t4,$s2
            sub         $t2,$t3,$t4
            mov         $t3,$t2
            ldi         $t4,#1
            add         $t1,$t3,$t4
            bge         $t0,$t1,queens_59
;       if (board[i][col - j])
            mov         $t2,$s3
            mov         $t3,$s2
            sub         $t1,$t2,$t3
            mov         $t2,$t1
            ldi         $t3,#3
            sll         $t0,$t2,$t3
            mov         $t4,$s0
            sto         $t0,-38[$fp]
            ldi         $t0,#6
            sll         $t3,$t4,$t0
            ldo         $t0,-38[$fp]
            mov         $t4,$t3
            mov         $t0,$s4
            add         $t2,$t4,$t0
            add         $t4,$t2,$t0
            beq         $t4,$x0,queens_61
;         return 1;
            bra         queens_53
queens_61:
queens_59:
;     if (col + j < 8)
            mov         $t1,$s3
            mov         $t2,$s2
            add         $t0,$t1,$t2
            slt         $t1,$t0,#8
            beqz        $t1,queens_63
;       if (board[i][col + j])
            mov         $t2,$s3
            mov         $t3,$s2
            add         $t1,$t2,$t3
            mov         $t2,$t1
            ldi         $t3,#3
            sll         $t0,$t2,$t3
            mov         $t4,$s0
            sto         $t0,-38[$fp]
            ldi         $t0,#6
            sll         $t3,$t4,$t0
            ldo         $t0,-38[$fp]
            mov         $t4,$t3
            mov         $t0,$s4
            add         $t2,$t4,$t0
            add         $t4,$t2,$t0
            beq         $t4,$x0,queens_65
;         return 1;
            bra         queens_53
queens_65:
queens_63:
            add         $s0,$s0,#1
            blt         $s0,$s1,queens_54
queens_55:
;   return 0;
            mov         $v0,$x0
            bra         queens_53
endpublic



;====================================================
; Basic Block 0
;====================================================
public code _solve:
            sub         $sp,$sp,#32
            sto         $fp,[$sp]
            sto         $x0,8[$sp]
            sto         $ra,24[$sp]
            mov         $fp,$sp
            sub         $sp,$sp,#42
            sto         $s0,0[$sp]
            sto         $s1,8[$sp]
            sto         $s2,16[$sp]
            ldo         $s0,-8[$fp]
            lea         $s1,32[$fp]
            ldo         $s2,552[$fp]
;   if (row == 8) {
            xor         $t0,$s2,#8
            bnez        $t0,queens_82
;     print_board(board);
            sub         $sp,$sp,#8
            sto         $s1,0[$sp]
            call        _print_board
            add         $sp,$sp,#8
queens_78:
queens_81:
            ldo         $s0,0[$sp]
            ldo         $s1,8[$sp]
            ldo         $s2,16[$sp]
            mov         $sp,$fp
            ldo         $fp,[$sp]
            ldo         $ra,24[$sp]
            add         $sp,$sp,#32
            ret     
queens_82:
;   for ( i = 0; i < 8; i++) {
            mov         $s0,$x0
            slt         $t0,$s0,#8
            beqz        $t0,queens_85
queens_84:
;     if (conflict(board, row, i)==0) {
            sub         $sp,$sp,#24
            sto         $s1,0[$sp]
            sto         $s2,8[$sp]
            sto         $s0,16[$sp]
            call        _conflict
            add         $sp,$sp,#24
            mov         $t0,$ra
            bne         $t0,$x0,queens_87
;       board[row][i] = 1;
            mov         $t1,$s0
            ldi         $t2,#3
            sll         $t0,$t1,$t2
            mov         $t3,$s2
            ldi         $t4,#6
            sll         $t2,$t3,$t4
            mov         $t3,$t2
            mov         $t4,$s1
            add         $t1,$t3,$t4
            ldi         $t3,#1
;       solve(board, row + 1);
            sub         $sp,$sp,#16
            sto         $s1,0[$sp]
            mov         $t1,$s2
            ldi         $t2,#1
            add         $t0,$t1,$t2
            sto         $t0,8[$sp]
            call        _solve
            add         $sp,$sp,#16
;       board[row][i] = 0;
            mov         $t1,$s0
            ldi         $t2,#3
            sll         $t0,$t1,$t2
            mov         $t3,$s2
            ldi         $t4,#6
            sll         $t2,$t3,$t4
            mov         $t3,$t2
            mov         $t4,$s1
            add         $t1,$t3,$t4
            add         $t3,$t1,$t0
            sto         $x0,[$t3]
queens_87:
            add         $s0,$s0,#1
            slt         $t0,$s0,#8
            bnez        $t0,queens_84
queens_85:
            bra         queens_81
endpublic



   bss
   align   8
   align   8
   dw   $FFF0200000000040
public bss _board:
   fill.b   512,0x00

endpublic
   code
   align   16
   data
   align   8
   code
   align   16
;====================================================
; Basic Block 0
;====================================================
public code _main:
            sub         $sp,$sp,#32
            sto         $fp,[$sp]
            sto         $x0,8[$sp]
            sto         $ra,24[$sp]
            mov         $fp,$sp
            sub         $sp,$sp,#46
            sto         $s0,0[$sp]
            sto         $s1,8[$sp]
            ldo         $s0,-18[$fp]
;   for (i = 0; i < 8; i++)
            mov         $s1,$x0
            slt         $t0,$s1,#8
            beqz        $t0,queens_104
queens_103:
;       for (j = 0; j < 8; j++)
            mov         $s0,$x0
            slt         $t0,$s0,#8
            beqz        $t0,queens_107
queens_106:
;         board[i][j] = 0;
            mov         $t1,$s0
            ldi         $t2,#3
            sll         $t0,$t1,$t2
            mov         $t3,$s1
            ldi         $t4,#6
            sll         $t2,$t3,$t4
            mov         $t3,$t2
            lea         $t4,_board
            add         $t1,$t3,$t4
            add         $t3,$t1,$t0
            sto         $x0,[$t3]
            add         $s0,$s0,#1
            slt         $t0,$s0,#8
            bnez        $t0,queens_106
queens_107:
            add         $s1,$s1,#1
            slt         $t0,$s1,#8
            bnez        $t0,queens_103
queens_104:
;   solve(board, 0);
            sub         $sp,$sp,#16
            lea         $t0,_board
            sto         $t0,0[$sp]
            sto         $x0,8[$sp]
            call        _solve
            add         $sp,$sp,#16
queens_99:
queens_102:
            ldo         $s0,0[$sp]
            ldo         $s1,8[$sp]
            mov         $sp,$fp
            ldo         $fp,[$sp]
            ldo         $ra,24[$sp]
            add         $sp,$sp,#32
            ret     
endpublic



   rodata
   align   16
;   global   _main
;   global   _board
;   global   _solve
;   global   _conflict
;   global   _printf
;   global   _print_board

_________________
Robert Finch http://www.finitron.ca


Fri Feb 14, 2020 9:56 am
Profile WWW

Joined: Mon Oct 07, 2019 1:26 pm
Posts: 46
Hi all, my C compiler is now integrated into the Kobold K2 online Javascript assembler/simulator.

On a single browser page you can now compile, assemble and simulate the C code, and download a file to program into the Kobold.

See it live at http://www.enscope.nl/simas_kobold2/ !

The page loads a prime-number generator as default program. You only have to press the RUN button to start it. However, the simulator
is quite slow because it simulates almost at the gate level.


Wed Mar 11, 2020 10:04 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 45 posts ]  Go to page Previous  1, 2, 3  Next

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software