Author |
Message |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Started working out of a new folder “Thor2022”. Got most things ported for Thor2022. Took an initial stab at getting the compiler and assembler to work with new instruction formats. Had to modify the compiler to split constants larger than 64-bits into multiple operations as the assembler cannot handle 128-bit constants yet. It should be possible to at least load a 128-bit constant piece-meal using shift and or operations.
_________________Robert Finch http://www.finitron.ca
|
Fri Mar 04, 2022 6:33 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Running Thor22 in SIM. Lots of small bugs worked out. Finally got to ‘AA’ LED display.
_________________Robert Finch http://www.finitron.ca
|
Sat Mar 05, 2022 7:12 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Trying to get a system built though to running it in FPGA hardware.
Ran into a nasty bug in the BIU. A combinational loop. But I cannot seem to identify what the cause of it is. On the schematic a bus signal is being fed back to itself when the signal is replicated through multiple LUTs. As far as I can tell there is no loop in the code. I have run into this before and it took me about a week to find the loop. If I recall correctly last time it had to do with a bad signal name.
Played with the MMU logic some more. It now supports both page tables and inverted page tables. Selected between the two by the low order bit of the page table base register.
_________________Robert Finch http://www.finitron.ca
|
Mon Mar 07, 2022 11:13 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Breaking the BIU up into more and more modules to try and localize the combinational loop. It seems to move around all over the place involving different paths between registers.
The combinational loop is gone now and I am not sure why. I must have changed the right line of code. The biggest change I made was to the synthesis settings to turn off retiming.
Forgot to update the micro-code for Thor2022 to correspond to the new instruction formats. Resulting in a crash when the ENTER instruction was executed.
_________________Robert Finch http://www.finitron.ca
|
Tue Mar 08, 2022 4:01 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Poured some more work into the compiler. In theory it now supports 128-bit integers. Got the parameters for the 128-bit divide backwards. This led to a zero result which did not match with the 64-bit result. I built a 64 vs 128-bit integer compare for the whole expression tree to identify where things were amiss. In most cases the 64-bit and 128-bit integers should agree, unless of course the integer is larger than 64-bits.
Modified the format of the PTE to allow for a 12-bit asid. The asid is tied to the process id and on my windows machine there are more than 256 process running. Also expanded the virtual address range to 48 bits recognized in the TLB. Basically, to use up as many bits of TLB memory as possible without incurring the additional block RAMs. the PTE is 90 bits in size now. Five of them will fit into a 512-bit cache line.
The page table group PTG, is 512-bits wide. The inverted page table uses open addressing with quadratic probing for collisions. I have it generating a page fault after 12 PTGs have been searched for translations with no match. 12x5 is 60 colliding translations, probably not very likely.
I am wondering about keeping track of empty PTEs in PTGs during the search for a translation. Ideally the first empty PTE should be used to store a translation on a miss. So, there needs to be a history record of the search. This has not been built yet. Recorded needs to be the PTG and PTE entry number. The search could also be made to stop when it finds an empty PTG. Care must be taken because deleted entries could clear the PTG while there are still translations left yet in following PTGs.
Also wondering if the table could be packed once a PTG is cleared. If all the entries in the PTG are clear it might be worthwhile to take collision entries and move them from following groups to the empty group. This would reduce search times.
Modified the data cache to use odd/even cache lines to handle unaligned data. The size of the data cache had to be doubled to support this, so it is now a whopping 64kB. Two sets of tags, odd and even, were needed.
_________________Robert Finch http://www.finitron.ca
|
Wed Mar 09, 2022 5:47 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
More work on the MMU / BIU. It now supports either software or hardware managed TLB in addition to supporting hash or hierarchical page tables. It is just a matter of flipping the correct bits.
_________________Robert Finch http://www.finitron.ca
|
Fri Mar 11, 2022 5:04 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Yet more work, this time on the TLB. The TLB has been updated to support least-recently-used updates in addition to random or fixed replacement. Which of the three algorithms to use is specified by the TLBRW instruction. Spent more time designing features of the MMU. Created an entity called the access rights table, ART to hold access rights for a page. The first time a translation is looked up the access rights are loaded from the ART. After that, future translations do not access the ART. The design is still in flux. I am considering making the access counter part of the TLB entry. Here are the layouts of the PTE and ARTE. And the TLBE
You do not have the required permissions to view the files attached to this post.
_________________Robert Finch http://www.finitron.ca
|
Sat Mar 12, 2022 4:22 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Added auto-aging to the TLB entries. Periodically the access counter is shifted right by a bit.
_________________Robert Finch http://www.finitron.ca
|
Sun Mar 13, 2022 5:00 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Decided to shelve the hierarchical page tables for now. Not satisfied with the need for 128-bit PTEs. The hashed page table does not care about power-of-two address alignment and uses less memory. It is probably faster as well.
Increased the size of a PTG so that eight PTEs would fit. This means the PTG is not an evenly sized page of memory in size, but it turns out not to matter for the hash table. It is still an even multiple of 128-bits though as this is the size of a memory access.
Made the reading of the PTG quit as soon as a matching PTE is found. The search takes place at the same time the PTG is being loaded. This can be done because the valid bit in the PTE is zero until it is loaded from memory. Also modified the PTG update to write only the last half of the PTE where the accessed bit is stored. So, there is only one memory access required.
_________________Robert Finch http://www.finitron.ca
|
Mon Mar 14, 2022 5:41 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Lost several hours of work. Somehow the file I was working on got reverted to an older version. I’m guessing I messed up, copying the file in the wrong direction when updating version control.
Made the region table updateable. Added an instruction to update the region table. There needs to be a separate access rights table, ART, for each region. The address of this table is stored in the region table. I am now calling the ART table the PMT standing for page management table.
Added the ability to bypass levels in the hierarchical page table.
_________________Robert Finch http://www.finitron.ca
|
Tue Mar 15, 2022 3:29 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
I have updated the MMU to absorb 10 bits at a time of the address. PTEs will have to be in clusters of four pages. This is not much different that just increasing the size of a page, except that the smaller pages get to be kept. This is only for hierarchical tables.
Added garbage collection cards to the TLB entries. The cards bits are set in a similar fashion to the dirty or modified bit. The TLB entries can then be scanned by the garbage collector to see where pointer stores occurred.
_________________Robert Finch http://www.finitron.ca
|
Wed Mar 16, 2022 3:09 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Now using a 64kB page size after doing some research and finding a 4kB page probably too small. Also came up with a way to use 1kB sections out of the 64kB. So, the 64kB page can be divided up.
Added MMU caching of PDE lookups. The cache is really small, eight entries, but fully associative.
_________________Robert Finch http://www.finitron.ca
|
Thu Mar 17, 2022 3:56 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Mostly more work on the BIU. Added MMU caching of PTGs in addition to PTEs. Realized I have got the hierarchical page table placed in the physical address space, and I am thinking maybe it should be in the virtual space, but that could lead to nested tlb misses. For the hash table I am assuming the entire table will be present in physical memory all the time. It is also located at a physical address. Got to the stage of LED output in simulation once again.
_________________Robert Finch http://www.finitron.ca
|
Fri Mar 18, 2022 4:28 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Latest Fixes: The assembler was not encoding register fields for any R1 type instruction. This was found from the PTGHASH instruction which evaluated to zero all the time. Other R1 type instructions had not been used yet.
The MOV instruction was not encoded correctly, it was using the Thor2021 opcode. This caused the value zero to be moved into registers.
Several instructions got missed in the instruction length decode. This led to various crashes in sim.
Latest Mods: modified the PUSH and POP instructions to push or pop up to four registers. Previously it could handle only three, but with extra bits available in the opcode due to smaller register spec fields a fourth register could be added. Was also able to free up two now redundant opcodes.
The hash page table is now implemented entirely in block RAM. This takes up about 1/3 of the available block rams. Uses a 256-bit wide port on the cpu side for loading and storing and a 2048-bit wide port on the memory side to allow an entire page group to be read in a single cycle. The hash table now has the same performance as a cache. It is just as fast at address lookup as the TLB. So the TLB has been removed.
Got to the LED lighting up again in sim, this time using hash page table lookups for virtual addressing.
_________________Robert Finch http://www.finitron.ca
|
Sun Mar 20, 2022 3:15 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2405 Location: Canada
|
Decided to get rid of one bit of the register type bits. The register type field was a two-bit field that specified if the register spec was for a register, vector register or a constant. The constant capability is redundant most of the time as there are other instruction formats where an immediate value can be specified. The type field has been reduced to specifying either a vector or a scalar register.
13-bit constants were not encoded properly by the assembler, two bits were trimmed off and the constant encoded as 11-bits. This led to loops not working correctly. The size of the constant field for Thor2022 increased by two bits and the assembler was only partially updated to account for this.
Up to the clear-screen point again. Flashy LEDs worked. And the three second delay worked all running in a virtual address space using a hash table. Text output is close to working.
_________________Robert Finch http://www.finitron.ca
|
Mon Mar 21, 2022 3:03 am |
|