View unanswered posts | View active topics It is currently Fri Apr 26, 2024 5:20 pm



Reply to topic  [ 159 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11  Next
 ANY-1 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
What about marking them with pseudo ops like "_name .proc" and ".endp _name".
I thought about this and the assembler handles something similar '_name .begin' and '.end _name'. For convenience though the double-dot is used as a synonym for '.end', it is shorter and easier to type. I was not so crazy as to rely on a ret instruction indicating the end of a routine. "()" is shorter and easier to type than "begin" or "proc". Dots are optional in the pseudo-ops for the assembler. Exactly what to use to mark the beginning and end of a function seems to vary from one assembler to the next.

The emulator is not going to be cycle exact, but it has fetch / decode and execute stages so there is some similarity to the hardware. I am reusing an emulator already in existence.

Busy day. Implemented the sync, mtbase and mfbase instructions.

Toying with the idea of having the base register used in an address calculation be specified as the low order four bits of an address value instead of the high order bits. Address bits 0 an on upwards would be specified in bits 4 and up.
ATM the base register to use is specified as the high order bits, an idea borrowed from the PowerPC segmentation, but for ANY1 the position of the base register spec changes depending on the number of address bits used to form an address. Some examples:
Currently: sto $a0,$A000C000 will store using the stack base register which is identified with the ‘A’ in the high order four bits of the address. This is for 32-bit addressing. For 24-bit addressing this would be: sto $a0,$A0C000. The processor’s address range is specified in control register zero. With the base register as the low order four bits these addresses would be specified as: sto $a0,$C000A. The issue with switching things is that it may reduce the number of bits available to encode an address in load / store instructions. The data base register was carefully chosen as base register #0 so that it would automatically be selected by addressing that assumes the high order bits are all zeros. Without using extended constants, load and store instruction support only an 11-bit displacement. I am not sure I wish to reduce this to 7 bits.

At startup the boot software initializes base registers for access to various sections. For instance, base register $b12 is loaded with the address of the rodata section. The use of base registers is spec’d in the ABI. Instead of using a global pointer register, data is referenced relative to a base register. Base register $15 is used for code and is automatically set along with the IP at reset.

I decided to reduce the base register shift amount to four bits from fourteen bits. This gives a much finer granularity for base registers. Having a finer granularity means less memory may be wasted. Rather than requiring sections to be aligned on 16kB boundaries they may now be aligned on 16B boundaries. Base registers control the access rights to memory and paging controls the visibility of memory.

In the future the base registers may be more than 64-bits wide, being split into base low and base high registers. This would allow an address space up to 128 bits.

**********

The performance of the assembler was decreased with the use of the GetName() method from the NameTable class. Rather than invoke the method GetName() the assembler was modified to go directly to the NameTable’s class data variable nametext[]. Going directly at the variable improved performance by about 25%.

It is a full minute faster to do this:
Code:
int icmp (const void *m1, const void *m2)
{
  SYM *n1; SYM *n2;
  n1 = (SYM *)m1;
  n2 = (SYM *)m2;
  if (n1->name==0) return 1;
  if (n2->name==0) return -1;
  return (strcmp(&nmTable.nametext[n1->name], &nmTable.nametext[n2->name]));
}

As opposed to:
Code:
int icmp (const void *m1, const void *m2)
{
  SYM *n1; SYM *n2;
  n1 = (SYM *)m1;
  n2 = (SYM *)m2;
  if (n1->name==0) return 1;
  if (n2->name==0) return -1;
  return (strcmp(nmTable.GetName(n1->name), nmTable.GetName(n2->name)));
}

Normally one would not want to access internal class variables directly, the preference is to use an accessor method, but in this case it improves performance substantially.

_________________
Robert Finch http://www.finitron.ca


Mon Aug 02, 2021 3:52 am
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
robfinch wrote:
I was not so crazy as to rely on a ret instruction indicating the end of a routine. "()" is shorter and easier to type than "begin" or "proc".
That works too. ;) However, the symbol that you selected for the end of function marker was easy to overlook. Granted that it's not really necessary for the human to determine the end of the function, and machine processing of the file, probably makes the ".." symbol not error prone to processing by your linker.

_________________
Michael A.


Mon Aug 02, 2021 11:45 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Just pondering how to do inter-segment jumps and calls in ANY1. This is bound to be a rare operation so a minimum of transistors should be dedicated to it. Re-using existing instructions is ideal.
To reach the target address successfully both the segment and instruction pointer must be set at the same time. To be able to return from an inter-segment call both the segment and instruction pointer need to be saved. An intra-segment call instruction saves the IP in one of the link registers x1 to x3. It is not too hard to move the current segment to another register using the MFSEG instruction before performing a call. So, a record of the current CS:IP is easily generated for the return address using existing instructions. The issue is still loading both the CS and IP at the same time. It is almost possible to use an MTSEG instruction to load the CS followed by an ordinary jump. One way to perform a call may be to have the CS segment load with a delayed effect so that more instructions may be fetched from the current segment before it changes. Another way to do this may be to use micro-code since the micro-code instruction sequence cannot be upset by a changing CS. A register could be ordained to carry the segment portion of the address, then a regular jump instruction used.
An inter-segment jump would look like:
Code:
LDI   $x31,#SEGMENT_VALUE
JALS   $xa,TargetLabel[$xn]   ; This instruction micro-coded (JALS = jump and link segment)

An inter-segment call would be coded as:
Code:
MFSEG   $x3,$b15      ; $b15 = code segment, record in $x3
LDI   $x31,#SEGMENT_VALUE
JALS   $x1,TargetLabel[$x0]

The JALS micro code would do:
Code:
MTSEG   $b15,$x31      ; moving to the code segment is safe to do here since JAL is micro-code
JALR   $Xa,TargetLabel[$Xn]   ; template for this instruction comes from the instruction stream

Note that a special version of the JALR instruction is required that does not take effect until the writeback stage when it can be guaranteed that the CS is already updated. The CS needs to be updated before instructions from the new target can be fetched.
Since an additional instruction is required the following may work:
Code:
JALS   $Xra,[$Xb:$Xip]

The instruction would load both the CS and IP from registers $Xb,$Xip specified in the instruction and store the return ip in $Xra. The return CS would need to be stored previously using a separate instruction.
Note that an inter-segment jump or call is going to be slow because the segment register needs to be updated which requires a pipeline flush to ensure it is valid before continuing. In the current machine it might take a dozen clock cycles to perform.
******
I made a short movie of the emulator in action, but I think it needs to be edited. What is a good inexpensive movie editor?

_________________
Robert Finch http://www.finitron.ca


Tue Aug 03, 2021 2:39 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Moving to the CS base register now sets a flag in the processor so that the next jump instruction (JAL or JALR) is deferred until the writeback stage. This is about the simplest solution as it is just a single bit flag. This gives time for the CS setting to take effect. It is interesting because the deferred jump is actually performed twice, once in the EX stage to the wrong address since the CS is being updated, and again to the correct address in the WB stage. It is also interesting because the means relies on the presence of instructions already in the pipeline after the CS altering instruction, but before the CS has changed. So, performing a far jump or call just uses existing instructions. There could also be invalid instructions in the pipeline due to the CS changing but these will be flushed out by the jump.
Code:
MTBASE $b15,$t0
JAL $ra,Target[$Xn]   ; This will be in the pipeline already before CS changes

Latest Additions:
Far versions of the ENTER and LEAVE instructions were created. The size of the return block was also adjusted to be 12 words or 96 bytes to allow room to store base registers, plus a couple of empty slots. The near versions of ENTER and LEAVE use the same size and format of return block as the far versions, but do not store or load base register values. It is a little bit of data bloat but stacks are usually small with a limited depth.
Added the ability to load and store base registers directly. This required very little additional hardware. However, indexed loads and stores to the base registers cannot be done. It would require too many hardware changes for its utility.
Added an interrupt disable instruction which has a limit of the number of instructions it applies to. In this case the max limit is seven instructions. It turned out to be easier to implement than I thought it might be. The instruction DI executes at the writeback stage and reverses out any external hardware interrupts that were flagged in following instructions in the queue. However, a non-maskable interrupt cannot be disabled this way.

_________________
Robert Finch http://www.finitron.ca


Wed Aug 04, 2021 4:39 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Duh, found out when updating the tokenizer that it already recognized ".proc" and ".endproc" as keywords. They just were not processed by the process() routine. A minor code addition later and voila, they can be used to identify functions as well. ".proc" and ".endproc" are probably about the most common means of identifying functions in assembler, it is crazy not to support them. I think the GNU/LLVM assembler has a ".type" keyword that can be used to specify that a label is a function as well.

_________________
Robert Finch http://www.finitron.ca


Wed Aug 04, 2021 9:26 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Working on loader aspect of things today. I want a means to call functions in libraries. The functions could be built-in ROM routines. I am thinking of having a simple symbol table consisting of entries containing the function name and address. Then at load time a program external functions would be matched up with entries in the symbol table to get the function address. I went off on a tangent again exploring how to perform far calls. Then concluded it is not really necessary most of the time as code can be mapped into the address space of a program and near calls used.

_________________
Robert Finch http://www.finitron.ca


Fri Aug 06, 2021 5:14 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Found and fixed a couple of bugs in the compiler.
Created byte, wyde, tetra and octa versions of some string library routines.

Added the .symtab directive to the assembler to dump a global symbol table of function entry points. Each entry dumped is 32-bits. 16 bits specifying the address and 16-bits specifying the name table index for the symbol name. Addresses have the five least significant bits as zero, so they are not stored. 16-bits is enough to represent a 21-bit address which should be enough to cover what’s accessible in the ROM.

_________________
Robert Finch http://www.finitron.ca


Sat Aug 07, 2021 4:28 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
What to do about read-only data, data, and code in a based architecture? For library routines that use read-only data a global pointer ($gp1) is initialized to point to the read-only data. This uses read-only base register $b12 as a reference point. There is just one issue with this. What if a subroutine wants to call a library routine and uses a read-only data pointer as an argument? The value in the argument register will have a reference to a read-only segment. Both the library routine and subroutine calling it cannot be using the same read-only base register - $b12. And what if one library routine calls another routine in a different library that has a different base address for read-only data?
One thing the system should try and avoid is constant loading and storing of base registers and global pointers.
My thought is to provide *a lot* of base registers to allow different libraries to use different base registers. The base register used by a particular library could be statically allocated. For instance, library #3 would use base register 035 (octal) as a reference to its global data. All base registers 0xx5 would refer to global data for the library xx. There would be 8 base registers reserved per library. So, libraries would use <libnum>n.
Most library routines would want to inherit the stack from the caller. So, the caller’s stack segment would be used. Library routines will be mapped into the caller’s address space, so that the caller’s code segment may be used.
A lot of base registers = 1024 which could fit into two block rams. That would be enough for 128 code libraries.

_________________
Robert Finch http://www.finitron.ca


Sun Aug 08, 2021 5:06 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
How do you handle display's for algol style languges?


Sun Aug 08, 2021 10:29 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I am not sure I follow the question, is there something specific?. Displaying information can be done using a function written in CC64 similar to a C function to display values. Or something could be written in assembler code. There is the printf() function as part of the C standard library. There is a debug printf() like routine dbg_printf() which is much simpler than printf() but still useful.

Upped the number of base registers to 2048. That should be enough base registers to allow eight per app for 256 apps. Base registers are not hard-coded in the ISA; they are accessed indirectly via a number specified in a GPR. Most apps will need only three or four base registers.
The idea is that the apps have designated base register numbers to use. So, there is no need to save and restore them as part of the app context when switches occur. For example: TinyBasic is designated as app #3 so uses base registers 03x where 03x is an octal number.
Base register #007 is special – it is the code base register.
Initialization looks like:
Code:
   align   16
TinyBasic_init .proc
   ; set TinyBasic data segment
   ldi      $a1,#$0000000000400006   ; 4MB boundary
   ldi      $a0,#0030   ; 0030 octal!
   mtbase   [$a0],$a1
   ; set TinyBasic stack segment
   ldi      $a1,#$FFFFFFFFFF40C006
   ldi      $a0,#0033                  ; App#3 SS
   mtbase   [$a0],$a1
   ; set Read-only segment (last 64k)
   ldi      $a1,#$FFFFFFFFFFFF0004
   ldi      $a0,#0035
   mtbase   [$a0],$a1
   sync
   ret
.endp TinyBasic_init


Base registers are associated with global pointer registers and the stack / frame pointers. There is an instruction to set the association. But basically the base register in use is specified in the upper bits of an address.
Code:
   ldi      $gp,#0
   ldi      $a0,#0030
   base   $gp,$gp,$a0
   ldi      $gp1,#0
   ldi      $a0,#0035
   base   $gp1,$gp1,$a0

_________________
Robert Finch http://www.finitron.ca


Mon Aug 09, 2021 4:41 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Played with the TLB today, getting it to work in the simulator now called FSIM. The MMU for the core is always enabled, all addresses are translated in all operating modes. Simulation has run up to the point where LEDs are updated. This required mapping the LED I/O address into the address space. For a while I had the TLB mapping too much of a virtual address. It was using part of the virtual address reserved for segmentation when it should not have been. A virtual address before a segment base is applied includes a reference to the base register to use. Once the base register is applied, the base register spec is stripped out of the address before being sent to the TLB.
A lock is needed on some TLB entries so that the TLB miss handler may run without being knocked out of the TLB. The TLB is four-way associative so it should be necessary to lock only one of the ways for specific pages. That would still allow three other pages to mapped to the same entry.
I am thinking of reserving the first 16kB of memory for the BIOS / TLB and a 16kB block of ROM addresses.

_________________
Robert Finch http://www.finitron.ca


Tue Aug 10, 2021 3:03 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Came up with code to perform far call and return. Far calls and returns should not be performed very often. This kind of code is needed to support device drivers which may have a different ROOT pointer than the rest of the OS. This code may eventually be wrapped up into macro instructions for far call and return. An issue is that it clobbers three temporary registers. I do not think this is much of an issue because temporaries are supposed to be caller saved when calling a function. Meaning that if they were needed by the calling function they should have been saved before the call.
The far call works by referencing a target address descriptor which is located in memory. The Ra register ($a0 in the example) of the far call instruction would be used to contain a pointer to the descriptor. Using a macro instruction a far call would look like:
Code:
jalf $ra,[$a0]
.
Here is the code that may make it into macro instructions:
Code:
; Perform a far call using a target address descriptor in memory.
;
; Parameters
;    $a0 = pointer to target address descriptor in memory
; Modifies:
;      $t1 = target ip
;      $t2,$t3   = base/bound values

_FAR_CALL .proc
   mfbase   $t2,$b7                     ; get current code segment base and bound and
   mfbnd      $t3,$b7
   push      $t2,$t3                     ; store on stack
   ldo         $t1,[$a0]                  ; get target ip from descriptor
   ldo         $t2,8[$a0]               ; get target code base and bound values
   ldo         $t3,16[$a0]
   di         #4                           ; disable interrupts until jump is complete
   mtbase   $b7,$t2                     ; move target code base and bound to current
   mtbnd      $b7,$t3
   jal         $ra,[$t1]                  ; call function
   jal         $x0,[$x2]                  ; return to far call's caller
.endp _FAR_CALL

; Perform far return. This code is jumped to.
;
; Parameters
;      $ra = return ip
; Modifies:
;      $t2,$t3 = base/bound values

_FAR_RET .proc
   pop         $t3,$t2                     ; get back stacked base/bound values
   di         #4                           ; disable interrupts until jump is complete
   mtbase   $b7,$t2                     ; set current base/bound
   mtbnd      $b7,$t3
   jal         $x0,[$ra]                  ; return
.endp _FAR_RET

_________________
Robert Finch http://www.finitron.ca


Wed Aug 11, 2021 7:01 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I decided to go with a more sophisticated segmentation model, using selectors and descriptors. ANY1 had base and bounds registers and they were being passed around as a pair in the RTL code. Then I had the thought why not just call the pair a descriptor and pass a single descriptor object around. It was not too much more of a step to use selectors.
A selector indicates which descriptor from a descriptor table to use. There are 1024 selector register available. There is a maximum of 64k worth of descriptors to choose from. Descriptors are stored in a descriptor table in memory. Descriptors associated with selectors are cached in a descriptor cache. The selector register to use for a memory access is indicated in bits 54 to 63 of the address. Excepting code addresses which always use the CS selector (selector #7). Selector #6 is reserved as the far code return selector.
At reset the first eight entries in the descriptor cache are initialized to a flat memory model to allow updating the global descriptor table in memory.

The far call and return procs were re-written to use selectors. Also added were far enter and leave and near enter and leave procs. These routines do a fair amount of work and include load and store operations. Since there may be exceptions, the corresponding macro instructions are not used. It is almost as fast to implement the procs as callable routines.

The assembler, compiler, and simulator all had to be updated.

_________________
Robert Finch http://www.finitron.ca


Sat Aug 14, 2021 3:05 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Bin getting the FSIM simulator to work. Having lots of fun with the MMU. Finally got ‘AAAA’ to appear on screen, and part of the startup message. The startup message does not appear where it should on-screen. It ends up showing up about ¾ of the way down the screen. I am confident that the screen is mapped properly or there would be nothing appearing.
For debugging it is possible to dump the descriptor cache and all the descriptors in use look correct. The descriptor cache is loaded from memory, so the descriptor table in memory must be getting updated correctly. It is also possible to dump the TLB entries, and there are a lot of them. 1024 entries for four different ways. It looks like the TLB entries are also being setup correctly.
For a while the scratch RAM was not mapped correctly. Four consecutive virtual address pages where all pointing to the same physical page number. I forgot to increment the physical page number when setting up the TLB.

_________________
Robert Finch http://www.finitron.ca


Wed Aug 18, 2021 3:32 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Back to working on the ANY1 core. I got a lot of CS01 working and started toying with the idea of porting it to the Nexys Video board. Then it occurred to me that I already had a system to work on. Time to switch back to ANY1. I managed to identify a couple of issues in the ANY1 system and fix them. Currently the core executes about 30 instructions then hangs just before the LEDs should light up.

Hardware issues: The SYNC instruction was not clearing the unimplemented instruction flag causing an unimplemented instruction exception when executed.
The TLB instruction was only partially implemented, I must have taken break then forgot to implement the rest of it. This caused the core to eventually hang waiting for a response for the instruction.
I went through all the components of the system and ensured that the data output was set to zero when the component was not selected. This is not a standard practice, but it allows wire-oring all the data outputs of the cores together to feed the cpu. This saves some hardware and time over using multiplexors.
The frame buffer now outputs a signal indicating to the sprite controller when it is a good time to access memory for sprites. There are some display issues with sprites.

For software, memory for the GDT (global descriptor table) was not mapped into the address space, this caused a page fault exception trying to setup segment descriptors, the page fault could not be processed as the system was not setup yet.

_________________
Robert Finch http://www.finitron.ca


Sat Sep 04, 2021 6:49 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 159 posts ]  Go to page Previous  1 ... 7, 8, 9, 10, 11  Next

Who is online

Users browsing this forum: No registered users and 11 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software