AnyCPU - View topic

Page 10 of 11

[ 159 posts ]

Go to page Previous 1 ... 7, 8, 9, 10, 11 Next

ANY-1

Author	Message
robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Quote: What about marking them with pseudo ops like "_name .proc" and ".endp _name". I thought about this and the assembler handles something similar '_name .begin' and '.end _name'. For convenience though the double-dot is used as a synonym for '.end', it is shorter and easier to type. I was not so crazy as to rely on a ret instruction indicating the end of a routine. "()" is shorter and easier to type than "begin" or "proc". Dots are optional in the pseudo-ops for the assembler. Exactly what to use to mark the beginning and end of a function seems to vary from one assembler to the next. The emulator is not going to be cycle exact, but it has fetch / decode and execute stages so there is some similarity to the hardware. I am reusing an emulator already in existence. Busy day. Implemented the sync, mtbase and mfbase instructions. Toying with the idea of having the base register used in an address calculation be specified as the low order four bits of an address value instead of the high order bits. Address bits 0 an on upwards would be specified in bits 4 and up. ATM the base register to use is specified as the high order bits, an idea borrowed from the PowerPC segmentation, but for ANY1 the position of the base register spec changes depending on the number of address bits used to form an address. Some examples: Currently: sto $a0,$A000C000 will store using the stack base register which is identified with the ‘A’ in the high order four bits of the address. This is for 32-bit addressing. For 24-bit addressing this would be: sto $a0,$A0C000. The processor’s address range is specified in control register zero. With the base register as the low order four bits these addresses would be specified as: sto $a0,$C000A. The issue with switching things is that it may reduce the number of bits available to encode an address in load / store instructions. The data base register was carefully chosen as base register #0 so that it would automatically be selected by addressing that assumes the high order bits are all zeros. Without using extended constants, load and store instruction support only an 11-bit displacement. I am not sure I wish to reduce this to 7 bits. At startup the boot software initializes base registers for access to various sections. For instance, base register $b12 is loaded with the address of the rodata section. The use of base registers is spec’d in the ABI. Instead of using a global pointer register, data is referenced relative to a base register. Base register $15 is used for code and is automatically set along with the IP at reset. I decided to reduce the base register shift amount to four bits from fourteen bits. This gives a much finer granularity for base registers. Having a finer granularity means less memory may be wasted. Rather than requiring sections to be aligned on 16kB boundaries they may now be aligned on 16B boundaries. Base registers control the access rights to memory and paging controls the visibility of memory. In the future the base registers may be more than 64-bits wide, being split into base low and base high registers. This would allow an address space up to 128 bits. ******** The performance of the assembler was decreased with the use of the GetName() method from the NameTable class. Rather than invoke the method GetName() the assembler was modified to go directly to the NameTable’s class data variable nametext[]. Going directly at the variable improved performance by about 25%. It is a full minute faster to do this: Code:** int icmp (const void m1, const void m2) { SYM n1; SYM n2; n1 = (SYM )m1; n2 = (SYM )m2; if (n1->name==0) return 1; if (n2->name==0) return -1; return (strcmp(&nmTable.nametext[n1->name], &nmTable.nametext[n2->name])); } As opposed to: Code: int icmp (const void m1, const void m2) { SYM n1; SYM n2; n1 = (SYM )m1; n2 = (SYM )m2; if (n1->name==0) return 1; if (n2->name==0) return -1; return (strcmp(nmTable.GetName(n1->name), nmTable.GetName(n2->name))); } Normally one would not want to access internal class variables directly, the preference is to use an accessor method, but in this case it improves performance substantially. _________________ Robert Finch http://www.finitron.ca
Mon Aug 02, 2021 3:52 am

MichaelM Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL	Re: ANY-1 robfinch wrote: I was not so crazy as to rely on a ret instruction indicating the end of a routine. "()" is shorter and easier to type than "begin" or "proc". That works too. However, the symbol that you selected for the end of function marker was easy to overlook. Granted that it's not really necessary for the human to determine the end of the function, and machine processing of the file, probably makes the ".." symbol not error prone to processing by your linker. _________________ Michael A.
Mon Aug 02, 2021 11:45 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Just pondering how to do inter-segment jumps and calls in ANY1. This is bound to be a rare operation so a minimum of transistors should be dedicated to it. Re-using existing instructions is ideal. To reach the target address successfully both the segment and instruction pointer must be set at the same time. To be able to return from an inter-segment call both the segment and instruction pointer need to be saved. An intra-segment call instruction saves the IP in one of the link registers x1 to x3. It is not too hard to move the current segment to another register using the MFSEG instruction before performing a call. So, a record of the current CS:IP is easily generated for the return address using existing instructions. The issue is still loading both the CS and IP at the same time. It is almost possible to use an MTSEG instruction to load the CS followed by an ordinary jump. One way to perform a call may be to have the CS segment load with a delayed effect so that more instructions may be fetched from the current segment before it changes. Another way to do this may be to use micro-code since the micro-code instruction sequence cannot be upset by a changing CS. A register could be ordained to carry the segment portion of the address, then a regular jump instruction used. An inter-segment jump would look like: Code: LDI $x31,#SEGMENT_VALUE JALS $xa,TargetLabel[$xn] ; This instruction micro-coded (JALS = jump and link segment) An inter-segment call would be coded as: Code: MFSEG $x3,$b15 ; $b15 = code segment, record in $x3 LDI $x31,#SEGMENT_VALUE JALS $x1,TargetLabel[$x0] The JALS micro code would do: Code: MTSEG $b15,$x31 ; moving to the code segment is safe to do here since JAL is micro-code JALR $Xa,TargetLabel[$Xn] ; template for this instruction comes from the instruction stream Note that a special version of the JALR instruction is required that does not take effect until the writeback stage when it can be guaranteed that the CS is already updated. The CS needs to be updated before instructions from the new target can be fetched. Since an additional instruction is required the following may work: Code: JALS $Xra,[$Xb:$Xip] The instruction would load both the CS and IP from registers $Xb,$Xip specified in the instruction and store the return ip in $Xra. The return CS would need to be stored previously using a separate instruction. Note that an inter-segment jump or call is going to be slow because the segment register needs to be updated which requires a pipeline flush to ensure it is valid before continuing. In the current machine it might take a dozen clock cycles to perform. ****** I made a short movie of the emulator in action, but I think it needs to be edited. What is a good inexpensive movie editor? _________________ Robert Finch http://www.finitron.ca
Tue Aug 03, 2021 2:39 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Moving to the CS base register now sets a flag in the processor so that the next jump instruction (JAL or JALR) is deferred until the writeback stage. This is about the simplest solution as it is just a single bit flag. This gives time for the CS setting to take effect. It is interesting because the deferred jump is actually performed twice, once in the EX stage to the wrong address since the CS is being updated, and again to the correct address in the WB stage. It is also interesting because the means relies on the presence of instructions already in the pipeline after the CS altering instruction, but before the CS has changed. So, performing a far jump or call just uses existing instructions. There could also be invalid instructions in the pipeline due to the CS changing but these will be flushed out by the jump. Code: MTBASE $b15,$t0 JAL $ra,Target[$Xn] ; This will be in the pipeline already before CS changes Latest Additions: Far versions of the ENTER and LEAVE instructions were created. The size of the return block was also adjusted to be 12 words or 96 bytes to allow room to store base registers, plus a couple of empty slots. The near versions of ENTER and LEAVE use the same size and format of return block as the far versions, but do not store or load base register values. It is a little bit of data bloat but stacks are usually small with a limited depth. Added the ability to load and store base registers directly. This required very little additional hardware. However, indexed loads and stores to the base registers cannot be done. It would require too many hardware changes for its utility. Added an interrupt disable instruction which has a limit of the number of instructions it applies to. In this case the max limit is seven instructions. It turned out to be easier to implement than I thought it might be. The instruction DI executes at the writeback stage and reverses out any external hardware interrupts that were flagged in following instructions in the queue. However, a non-maskable interrupt cannot be disabled this way. _________________ Robert Finch http://www.finitron.ca
Wed Aug 04, 2021 4:39 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Duh, found out when updating the tokenizer that it already recognized ".proc" and ".endproc" as keywords. They just were not processed by the process() routine. A minor code addition later and voila, they can be used to identify functions as well. ".proc" and ".endproc" are probably about the most common means of identifying functions in assembler, it is crazy not to support them. I think the GNU/LLVM assembler has a ".type" keyword that can be used to specify that a label is a function as well. _________________ Robert Finch http://www.finitron.ca
Wed Aug 04, 2021 9:26 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Working on loader aspect of things today. I want a means to call functions in libraries. The functions could be built-in ROM routines. I am thinking of having a simple symbol table consisting of entries containing the function name and address. Then at load time a program external functions would be matched up with entries in the symbol table to get the function address. I went off on a tangent again exploring how to perform far calls. Then concluded it is not really necessary most of the time as code can be mapped into the address space of a program and near calls used. _________________ Robert Finch http://www.finitron.ca
Fri Aug 06, 2021 5:14 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Found and fixed a couple of bugs in the compiler. Created byte, wyde, tetra and octa versions of some string library routines. Added the .symtab directive to the assembler to dump a global symbol table of function entry points. Each entry dumped is 32-bits. 16 bits specifying the address and 16-bits specifying the name table index for the symbol name. Addresses have the five least significant bits as zero, so they are not stored. 16-bits is enough to represent a 21-bit address which should be enough to cover what’s accessible in the ROM. _________________ Robert Finch http://www.finitron.ca
Sat Aug 07, 2021 4:28 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 What to do about read-only data, data, and code in a based architecture? For library routines that use read-only data a global pointer ($gp1) is initialized to point to the read-only data. This uses read-only base register $b12 as a reference point. There is just one issue with this. What if a subroutine wants to call a library routine and uses a read-only data pointer as an argument? The value in the argument register will have a reference to a read-only segment. Both the library routine and subroutine calling it cannot be using the same read-only base register - $b12. And what if one library routine calls another routine in a different library that has a different base address for read-only data? One thing the system should try and avoid is constant loading and storing of base registers and global pointers. My thought is to provide a lot of base registers to allow different libraries to use different base registers. The base register used by a particular library could be statically allocated. For instance, library #3 would use base register 035 (octal) as a reference to its global data. All base registers 0xx5 would refer to global data for the library xx. There would be 8 base registers reserved per library. So, libraries would use <libnum>n. Most library routines would want to inherit the stack from the caller. So, the caller’s stack segment would be used. Library routines will be mapped into the caller’s address space, so that the caller’s code segment may be used. A lot of base registers = 1024 which could fit into two block rams. That would be enough for 128 code libraries. _________________ Robert Finch http://www.finitron.ca
Sun Aug 08, 2021 5:06 am

oldben Joined: Mon Oct 07, 2019 2:41 am Posts: 814	Re: ANY-1 How do you handle display's for algol style languges?
Sun Aug 08, 2021 10:29 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 I am not sure I follow the question, is there something specific?. Displaying information can be done using a function written in CC64 similar to a C function to display values. Or something could be written in assembler code. There is the printf() function as part of the C standard library. There is a debug printf() like routine dbg_printf() which is much simpler than printf() but still useful. Upped the number of base registers to 2048. That should be enough base registers to allow eight per app for 256 apps. Base registers are not hard-coded in the ISA; they are accessed indirectly via a number specified in a GPR. Most apps will need only three or four base registers. The idea is that the apps have designated base register numbers to use. So, there is no need to save and restore them as part of the app context when switches occur. For example: TinyBasic is designated as app #3 so uses base registers 03x where 03x is an octal number. Base register #007 is special – it is the code base register. Initialization looks like: Code: align 16 TinyBasic_init .proc ; set TinyBasic data segment ldi $a1,#$0000000000400006 ; 4MB boundary ldi $a0,#0030 ; 0030 octal! mtbase [$a0],$a1 ; set TinyBasic stack segment ldi $a1,#$FFFFFFFFFF40C006 ldi $a0,#0033 ; App#3 SS mtbase [$a0],$a1 ; set Read-only segment (last 64k) ldi $a1,#$FFFFFFFFFFFF0004 ldi $a0,#0035 mtbase [$a0],$a1 sync ret .endp TinyBasic_init Base registers are associated with global pointer registers and the stack / frame pointers. There is an instruction to set the association. But basically the base register in use is specified in the upper bits of an address. Code: ldi $gp,#0 ldi $a0,#0030 base $gp,$gp,$a0 ldi $gp1,#0 ldi $a0,#0035 base $gp1,$gp1,$a0 _________________ Robert Finch http://www.finitron.ca
Mon Aug 09, 2021 4:41 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Played with the TLB today, getting it to work in the simulator now called FSIM. The MMU for the core is always enabled, all addresses are translated in all operating modes. Simulation has run up to the point where LEDs are updated. This required mapping the LED I/O address into the address space. For a while I had the TLB mapping too much of a virtual address. It was using part of the virtual address reserved for segmentation when it should not have been. A virtual address before a segment base is applied includes a reference to the base register to use. Once the base register is applied, the base register spec is stripped out of the address before being sent to the TLB. A lock is needed on some TLB entries so that the TLB miss handler may run without being knocked out of the TLB. The TLB is four-way associative so it should be necessary to lock only one of the ways for specific pages. That would still allow three other pages to mapped to the same entry. I am thinking of reserving the first 16kB of memory for the BIOS / TLB and a 16kB block of ROM addresses. _________________ Robert Finch http://www.finitron.ca
Tue Aug 10, 2021 3:03 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Came up with code to perform far call and return. Far calls and returns should not be performed very often. This kind of code is needed to support device drivers which may have a different ROOT pointer than the rest of the OS. This code may eventually be wrapped up into macro instructions for far call and return. An issue is that it clobbers three temporary registers. I do not think this is much of an issue because temporaries are supposed to be caller saved when calling a function. Meaning that if they were needed by the calling function they should have been saved before the call. The far call works by referencing a target address descriptor which is located in memory. The Ra register ($a0 in the example) of the far call instruction would be used to contain a pointer to the descriptor. Using a macro instruction a far call would look like: Code: jalf $ra,[$a0] . Here is the code that may make it into macro instructions: Code: ; Perform a far call using a target address descriptor in memory. ; ; Parameters ; $a0 = pointer to target address descriptor in memory ; Modifies: ; $t1 = target ip ; $t2,$t3 = base/bound values _FAR_CALL .proc mfbase $t2,$b7 ; get current code segment base and bound and mfbnd $t3,$b7 push $t2,$t3 ; store on stack ldo $t1,[$a0] ; get target ip from descriptor ldo $t2,8[$a0] ; get target code base and bound values ldo $t3,16[$a0] di #4 ; disable interrupts until jump is complete mtbase $b7,$t2 ; move target code base and bound to current mtbnd $b7,$t3 jal $ra,[$t1] ; call function jal $x0,[$x2] ; return to far call's caller .endp _FAR_CALL ; Perform far return. This code is jumped to. ; ; Parameters ; $ra = return ip ; Modifies: ; $t2,$t3 = base/bound values _FAR_RET .proc pop $t3,$t2 ; get back stacked base/bound values di #4 ; disable interrupts until jump is complete mtbase $b7,$t2 ; set current base/bound mtbnd $b7,$t3 jal $x0,[$ra] ; return .endp _FAR_RET _________________ Robert Finch http://www.finitron.ca
Wed Aug 11, 2021 7:01 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Selectors I decided to go with a more sophisticated segmentation model, using selectors and descriptors. ANY1 had base and bounds registers and they were being passed around as a pair in the RTL code. Then I had the thought why not just call the pair a descriptor and pass a single descriptor object around. It was not too much more of a step to use selectors. A selector indicates which descriptor from a descriptor table to use. There are 1024 selector register available. There is a maximum of 64k worth of descriptors to choose from. Descriptors are stored in a descriptor table in memory. Descriptors associated with selectors are cached in a descriptor cache. The selector register to use for a memory access is indicated in bits 54 to 63 of the address. Excepting code addresses which always use the CS selector (selector #7). Selector #6 is reserved as the far code return selector. At reset the first eight entries in the descriptor cache are initialized to a flat memory model to allow updating the global descriptor table in memory. The far call and return procs were re-written to use selectors. Also added were far enter and leave and near enter and leave procs. These routines do a fair amount of work and include load and store operations. Since there may be exceptions, the corresponding macro instructions are not used. It is almost as fast to implement the procs as callable routines. The assembler, compiler, and simulator all had to be updated. _________________ Robert Finch http://www.finitron.ca
Sat Aug 14, 2021 3:05 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Bin getting the FSIM simulator to work. Having lots of fun with the MMU. Finally got ‘AAAA’ to appear on screen, and part of the startup message. The startup message does not appear where it should on-screen. It ends up showing up about ¾ of the way down the screen. I am confident that the screen is mapped properly or there would be nothing appearing. For debugging it is possible to dump the descriptor cache and all the descriptors in use look correct. The descriptor cache is loaded from memory, so the descriptor table in memory must be getting updated correctly. It is also possible to dump the TLB entries, and there are a lot of them. 1024 entries for four different ways. It looks like the TLB entries are also being setup correctly. For a while the scratch RAM was not mapped correctly. Four consecutive virtual address pages where all pointing to the same physical page number. I forgot to increment the physical page number when setting up the TLB. _________________ Robert Finch http://www.finitron.ca
Wed Aug 18, 2021 3:32 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2392 Location: Canada	Re: ANY-1 Back to working on the ANY1 core. I got a lot of CS01 working and started toying with the idea of porting it to the Nexys Video board. Then it occurred to me that I already had a system to work on. Time to switch back to ANY1. I managed to identify a couple of issues in the ANY1 system and fix them. Currently the core executes about 30 instructions then hangs just before the LEDs should light up. Hardware issues: The SYNC instruction was not clearing the unimplemented instruction flag causing an unimplemented instruction exception when executed. The TLB instruction was only partially implemented, I must have taken break then forgot to implement the rest of it. This caused the core to eventually hang waiting for a response for the instruction. I went through all the components of the system and ensured that the data output was set to zero when the component was not selected. This is not a standard practice, but it allows wire-oring all the data outputs of the cores together to feed the cpu. This saves some hardware and time over using multiplexors. The frame buffer now outputs a signal indicating to the sprite controller when it is a good time to access memory for sprites. There are some display issues with sprites. For software, memory for the GDT (global descriptor table) was not mapped into the address space, this caused a page fault exception trying to setup segment descriptors, the page fault could not be processed as the system was not setup yet. _________________ Robert Finch http://www.finitron.ca
Sat Sep 04, 2021 6:49 am

Page 10 of 11

[ 159 posts ]

Go to page Previous 1 ... 7, 8, 9, 10, 11 Next

ANY-1

Who is online