View unanswered posts | View active topics It is currently Thu Apr 18, 2024 4:55 pm



Reply to topic  [ 159 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11  Next
 ANY-1 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Additions:
Also back in the instruction set are PUSH, POP and PUSH, POP pair instructions. If pushing or popping only one or two registers these instructions will be faster than a stack adjustment plus load or store multiple instruction. They are also code dense.
The stack-based additions have come about as a reasonable way to implement micro-code was found. The micro-code just uses regular instructions as the micro-ops.

_________________
Robert Finch http://www.finitron.ca


Tue Jul 13, 2021 8:38 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Improved the compilers management of the stack. It no longer allocates excessive amounts beyond what is really needed. The excess was caused by the compiler accumulating the stack size for both passes. It now just figures out the max stack usage during the first pass.
Added the CMOVNZ and MUX instructions to the implementation. Found a way to reduce the instructions required to implement LINK by one. There are about 40 micro-code entries that could be assigned use yet.

_________________
Robert Finch http://www.finitron.ca


Wed Jul 14, 2021 4:29 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Took the pointer difference instruction (PTRDIF) out of the architecture. The issue is that it requires a modifier as three register source ports are needed. That means it is not any more code dense than simply performing a subtract followed by an arithmetic shift right. Not any more code dense and does not execute any faster, so byte-bye. The instruction made sense on another architecture that supported three register read ports. In that case it was faster and shorter.

_________________
Robert Finch http://www.finitron.ca


Thu Jul 15, 2021 5:22 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
If POP and UNLINK are in the instruction set then RTS may as well be too. RTS is a special case of POP just as is UNLINK. RTS in this case pops the return address register off the stack then loads it into the IP.
The stack is also adjustable by up to 32752 bytes to pop parameters off the stack. RTS is code dense, replacing three separate instructions.

Added a new compiler option to disable loop invariant optimization. Loop invariant optimization attempts to move code that does not vary outside of the loop. This optimization currently does not work very well.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 16, 2021 5:50 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Made another major change to the design. Removed the IP referencing by x31 and changed it back into an ordinary register. This frees up a register. And removes one input from the source multiplexors in the RTL code. Then added an ADD to IP instruction to compensate for those times when an IP relative address is desired. The compiler currently does not make use of IP relative addressing, so the register sat there unused. The compiler uses addressing relative to a read-only global pointer.

_________________
Robert Finch http://www.finitron.ca


Sun Jul 25, 2021 3:48 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Some static instruction count statistics for the boot rom that include the standard library.
I have since removed the BLcc branches as they are not used often enough. The standard for cutting things I have been using is the xor function or about 0.05% usage (1/2 of .1%). Prefixes make up about 11% of the instructions. It is tempting to just make the instruction 10% wider, but that would likely not eliminate many prefixes.

Code:
Instruction Statistics
Loads:      6052 (20.403897%)
Stores:     4008 (13.512693%)
  Indexed:   246 (0.829372%)
Pushes:      932 (3.142173%)
Compares:     24 (0.080914%)
Branches:   3799 (12.808064%)
BEQZ/NEZ:      540 (1.820572%)
  BEQI:           0 (0.000000%)
  BNEI:           0 (0.000000%)
  BBc:          47 (0.158457%)
  BLcc:          13 (0.043829%)
Calls:        1538 (5.185260%)
Returns:     1090 (3.674859%)
Enter:        490 (1.652001%)
Leave:        468 (1.577829%)
Adds:         3052 (10.289606%)
Subs:          142 (0.478743%)
Ands:          521 (1.756515%)
Ors:          261 (0.879943%)
Xors:           19 (0.064057%)
Bits:            0 (0.000000%)
Tsts:            0 (0.000000%)
Lshifts:     269 (0.906915%)
shifts:        715 (2.410573%)
Luis:            0 (0.000000%)
Moves:       1824 (6.149489%)
CMoves:         46 (0.155086%)
Sets:          275 (0.927143%)
  Mops:        0 (0.000000%)
Ptrdif:        0 (0.000000%)
Bitfield:     14 (0.047200%)
Csr:           196 (0.660800%)
Floatops:    159 (0.536057%)
Prefixes:   3175 (10.704292%)
others:      826 (2.784802%)
Total:     29661

number of bytes: 116042.500000
number of instructions: 29661
number of compressed instructions: 8716
3.912292 bytes (31 bits) per instruction
Compression ratio: 13.060173%

_________________
Robert Finch http://www.finitron.ca


Mon Jul 26, 2021 7:49 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the BASE instruction which sets the upper nybble of an address to specify the base register in use. It also zeros out higher bits in the register. BASE is typically used after an operation (lea) to set the base register for a variable. It is only applicable to user mode programs where base registers are applied.

_________________
Robert Finch http://www.finitron.ca


Tue Jul 27, 2021 4:59 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Made the BASE instruction apply conditionally in non-user mode.
Modified the TLB to initialize addressing for the first 48MB ram and last 16MB I/O ROM. Also setup the segment registers to make things look linear for 32-bit addressing. Both these changes allow the MMU hardware to be active for all modes of operations from reset. Having the MMU active all the time means some multiplexing can be removed from the memory address path.

_________________
Robert Finch http://www.finitron.ca


Wed Jul 28, 2021 3:55 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Started working on a linker. The file format in use is 64-bit ELF files. It turns out to be handy to know the size of object code, so a new notation was added to the assembler. The notation indicates the end of a routine so that the number of bytes the routine occupies may be calculated. the notation is a double-dot or the keyword 'end' plus optionally the name of the routine matching the label for the routine. For example:
Code:
PutCRLF:
   enter   #32
   ldi      $a0,#CR
   bal      $x1,Putch
   ldi      $a0,#LNFD
   bal      $x1,Putch
   leave   #32
..

From the summary info generated during assemble:
Code:
  PutCRLF                                    code  fffffffffffc10d7.1   27 18 6

PutCRLF occupies 27 bytes (rounded up) at address ff…ffc10d7.8. 18 bits are required to represent the address and the symbol is referenced 6 times.

_________________
Robert Finch http://www.finitron.ca


Thu Jul 29, 2021 3:23 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Ported over TinyBasic for ANY1. A lot of work on software lately. Still unable to get the core to run in an FPGA. The latest mod was to the TLB, it now initializes so that the MMU may be used in any operating mode. This adds a 2048 cpu clock cycle startup delay before memory can be accessed. Even with this delay the core loads a cache line from memory correctly. But cannot seem to execute instructions. The TLB is clocked at double the cpu clock rate. Looking at making the memory controller run on a separate clock from the cpu clock. For loads and stores it communicates with the cpu via fifos, so there is a reasonable possibility of using a different clock rate. The I$ is dual ported with a read side and write side. It would be easy to use each side on a different clock. The goal would be to run the memory controller at double the cpu clock rate.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 30, 2021 3:46 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added to the assembler syntax. Functions as opposed to generic labels are now identified with ‘()’ following the name. The linker needs to know what kind of a symbol it is dealing with. Previously all symbols in the assembler were of the same generic type.
Code:
   code
   align      16
_Delay2s():
ifdef TEST
  ldi     $a1,#10
else
   ldi         $a1,#3000000
endif
.0001:
   srl         $a2,$a1,#16
   stb         $a2,LEDS
   sub       $a1,$a1,#1
   bgeu     $a1,#1,.0001
   ret
..

Today’s work was mainly on the linker.

Some means of testing programs for the ANY1 is needed without necessarily using a real processor. So it is just about time to get an emulator running.

_________________
Robert Finch http://www.finitron.ca


Sat Jul 31, 2021 4:16 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1782
Three cheers for emulators!


Sat Jul 31, 2021 10:20 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Worked on both the linker and assembler.
Came up with another notation for the assembler to indicate the size of data. A label to data is followed by the size in bytes in square brackets.
Code:
 public bss __noname_var0[96]:

   fill.b   96,0x00                   

endpublic

This is another aid to the linker. The size may be omitted but then the linker will assume the size to be the difference in location between the current symbol and the next one in the same section. The alignment of the data may also be specified after the byte size using a comma to separate the two.

_________________
Robert Finch http://www.finitron.ca


Sun Aug 01, 2021 4:19 am
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
robfinch wrote:
Functions as opposed to generic labels are now identified with ‘()’ following the name.
What about marking them with pseudo ops like "_name .proc" and ".endp _name". From your example above, it seems to require a single "ret" instruction to the determined the last component of the routine / function. I think that it might be more flexible for your to emit a "_name .proc" during the processing of the function entry, and then at the completion of the function to emit the matching ".endp _name" to identify that the compiler has completed processing of the function. This approach should allow the function to contain multiple "ret" statements, and would still allow your linker to provide function size information.
BigEd wrote:
Three cheers for emulators!
Whaaat!!! :twisted:

_________________
Michael A.


Sun Aug 01, 2021 1:14 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
What about marking them with pseudo ops like "_name .proc" and ".endp _name".
I thought about this and the assembler handles something similar '_name .begin' and '.end _name'. For convenience though the double-dot is used as a synonym for '.end', it is shorter and easier to type. I was not so crazy as to rely on a ret instruction indicating the end of a routine. "()" is shorter and easier to type than "begin" or "proc". Dots are optional in the pseudo-ops for the assembler. Exactly what to use to mark the beginning and end of a function seems to vary from one assembler to the next.

The emulator is not going to be cycle exact, but it has fetch / decode and execute stages so there is some similarity to the hardware. I am reusing an emulator already in existence.

Busy day. Implemented the sync, mtbase and mfbase instructions.

Toying with the idea of having the base register used in an address calculation be specified as the low order four bits of an address value instead of the high order bits. Address bits 0 an on upwards would be specified in bits 4 and up.
ATM the base register to use is specified as the high order bits, an idea borrowed from the PowerPC segmentation, but for ANY1 the position of the base register spec changes depending on the number of address bits used to form an address. Some examples:
Currently: sto $a0,$A000C000 will store using the stack base register which is identified with the ‘A’ in the high order four bits of the address. This is for 32-bit addressing. For 24-bit addressing this would be: sto $a0,$A0C000. The processor’s address range is specified in control register zero. With the base register as the low order four bits these addresses would be specified as: sto $a0,$C000A. The issue with switching things is that it may reduce the number of bits available to encode an address in load / store instructions. The data base register was carefully chosen as base register #0 so that it would automatically be selected by addressing that assumes the high order bits are all zeros. Without using extended constants, load and store instruction support only an 11-bit displacement. I am not sure I wish to reduce this to 7 bits.

At startup the boot software initializes base registers for access to various sections. For instance, base register $b12 is loaded with the address of the rodata section. The use of base registers is spec’d in the ABI. Instead of using a global pointer register, data is referenced relative to a base register. Base register $15 is used for code and is automatically set along with the IP at reset.

I decided to reduce the base register shift amount to four bits from fourteen bits. This gives a much finer granularity for base registers. Having a finer granularity means less memory may be wasted. Rather than requiring sections to be aligned on 16kB boundaries they may now be aligned on 16B boundaries. Base registers control the access rights to memory and paging controls the visibility of memory.

In the future the base registers may be more than 64-bits wide, being split into base low and base high registers. This would allow an address space up to 128 bits.

**********

The performance of the assembler was decreased with the use of the GetName() method from the NameTable class. Rather than invoke the method GetName() the assembler was modified to go directly to the NameTable’s class data variable nametext[]. Going directly at the variable improved performance by about 25%.

It is a full minute faster to do this:
Code:
int icmp (const void *m1, const void *m2)
{
  SYM *n1; SYM *n2;
  n1 = (SYM *)m1;
  n2 = (SYM *)m2;
  if (n1->name==0) return 1;
  if (n2->name==0) return -1;
  return (strcmp(&nmTable.nametext[n1->name], &nmTable.nametext[n2->name]));
}

As opposed to:
Code:
int icmp (const void *m1, const void *m2)
{
  SYM *n1; SYM *n2;
  n1 = (SYM *)m1;
  n2 = (SYM *)m2;
  if (n1->name==0) return 1;
  if (n2->name==0) return -1;
  return (strcmp(nmTable.GetName(n1->name), nmTable.GetName(n2->name)));
}

Normally one would not want to access internal class variables directly, the preference is to use an accessor method, but in this case it improves performance substantially.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 02, 2021 3:52 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 159 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11  Next

Who is online

Users browsing this forum: Bing [Bot], DotBot and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software