View unanswered posts | View active topics It is currently Sat Apr 20, 2024 2:58 am



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 38, 39, 40, 41, 42, 43, 44 ... 52  Next
 Thor Core / FT64 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1782
This sounds like you've wrestled it into a healthier state - hurrah!


Mon Mar 21, 2022 4:18 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
It is slowly making progress.

Latest fixes: The LEA instruction opcode was freed up and LEA made an alternate mnemonic for ADD but the assembler was not updated. This led to the wrong value loaded into the global pointer register.

Latest mods: had a case where the BIU locked up waiting for an ack, but there was no active memory cycle. The ack must have got missed, the cycle must have aborted, or there is a hardware error of some sort. The hardware is complicated so maybe I missed a corner case where the bus was not to be active. The BIU was modified to continue if either an ack was present or the bus was no longer active.

A memory load / store queue was written. It can accept input from two sources and bypass stores to loads. It was integrated into the BIU, replacing a fifo. Only one source is used for this design. The queue depth was set quite shallow since the current design has a synchronous memory interface. There is only ever one outstanding memory request.

_________________
Robert Finch http://www.finitron.ca


Wed Mar 23, 2022 5:03 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Squeezed the PTE back down to 128-bit by removing most of the access rights information. This allows twice as many PTEs to be used in mapping. The PTE only maps an address now. There are 32768 PTEs, four times the number of physical pages. The other information usually associated with a PTE comes from a second table, PMT which contains only 16384 entries. The PMT contains the protection key, privilege level, access count, and rwx access rights. rwx access rights are also stored in the PTE so they may be setup differently for different users. The PMT is accessed after the physical address is known. So, translations require two block RAM accesses which amounts to two clock cycles as the block RAM is clocked at double the cpu clock rate.
The setup allows for up to 1GB physical space split between DRAM, 512 MB and secondary storage.

_________________
Robert Finch http://www.finitron.ca


Thu Mar 24, 2022 3:46 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The compiler ran out of temporary registers and output incorrect code. Ran into a case where constants were not being reduced to a single constant. Started working a new compiler based on VBCC.

_________________
Robert Finch http://www.finitron.ca


Sat Mar 26, 2022 3:12 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Worked on the compiler code generator. Set things up so that prolog and epilog statements completely override the generation of prolog and epilog code. The prolog and epilog statements allow things like interrupt routines to be written. They give greater control over code without needing the ‘interrupt’ keyword applied to a function.

_________________
Robert Finch http://www.finitron.ca


Sun Mar 27, 2022 3:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Modified the operation of the JBS instruction. The JBS and JBC instructions now share the same opcode. This frees up an opcode. This was possible because JBS / JBC do not need to regard the compare method, so that field was reused for additional opcode bits.

Modified branches repurposing the Tb field into a branch displacement bit. This gives branches a whopping 21-bit displacement for a range of ±2MB.

In the Thor test system, there is a table secondary to the page translation table that stores info that may be used by hardware. The table is accessed once the physical address is known, so it adds a clock cycle to memory access. It could be loaded into the TLB when a page is loaded, but due to hardware limitations (No TLB) that is not done on the test system. For lack of a better name I am calling this the page management table, PMT. It contains:
SC - Share Count
M – page modified bit
AC - Access Count, number of times page accessed since last clear of count
ACL – access control list reference
KY - key required to access page
PL – privilege level required to access page
PCI – sub page compression indicators
AL – compression algorithm
EN – encrypted page indicator
N – conforming code page indicator
V – entry valid bit

_________________
Robert Finch http://www.finitron.ca


Mon Mar 28, 2022 3:29 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Test output almost worked now. Displaying the tetra value 0x87654321 displays 8<space>654321, it’s a bit maddening. It could be a glitch in the update of display memory. Previously the display of the constant was something like: 87:;=?89. What I did to improve the output was zero out a whole bunch of pipeline registers on a flow-change, instead of simply clearing the valid flag bits. There must have been something in the pipeline causing erroneous operation. Anyway, it seems to be fixed now. Was stuck on that bug for like months.

After a couple of compiler fixes the MapPage() routine seems to work. MapPage() looks for an empty entry in the hash table where to place a new map entry. Another routine does an absolute placement of an entry overwriting any existing entry in the same spot.

_________________
Robert Finch http://www.finitron.ca


Tue Mar 29, 2022 4:31 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Updating the TLB to read the PMT on a load and write the PMT when a modified TLB entry is written. Changed the way the TLB is accessed. Previously it was updated using a hexi-byte pair store instruction. And the TLB appeared as a series of memory locations. However, the hexi-byte pair did not contain enough information. Now the TLB is updated indirectly. There is an array of eight buckets, of which only five are used, to hold onto the TLB info for update. After all buckets have been set appropriately, the TLB is updated in an atomic fashion, triggered by a write operation to bucket seven. The TLB buckets appear as a MMIO device.

_________________
Robert Finch http://www.finitron.ca


Thu Mar 31, 2022 3:28 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added a shorter branch and branch to subroutine instruction. The compiler was modified to use the shorter instruction format for functions declared as static. Static functions are local to the translation unit being processed and hence very likely can be accommodated with shorter subroutine calls. The short call version supports a 19-bit displacement value or +/- 256kB.

Added shorter forms for shift immediate instructions, but the shift amount can go only up to 63. To shift more bits the amount must be loaded into a register and the register form used.

_________________
Robert Finch http://www.finitron.ca


Fri Apr 01, 2022 3:08 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
What I am looking at for opcode space. One of the groups of barred off instruction opcodes will likely be used to implement 16-bit compressed instructions.
Attachment:
File comment: Thor2022 root opcodes
RootOpcodes.png
RootOpcodes.png [ 76.15 KiB | Viewed 630 times ]

_________________
Robert Finch http://www.finitron.ca


Fri Apr 01, 2022 8:41 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added short 16-bit form register indirect load and store to stack instructions. These instructions are used at function entry and exit to save and restore registers. Just these two instructions saved about 7 to 8% of code space.

With the shorter branch to subroutine the link register was not being set. This led to an infinite loop forming as a return to a previous address occurred.

_________________
Robert Finch http://www.finitron.ca


Sat Apr 02, 2022 2:56 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the load hexi-byte quad instruction which loads a 512-bit value into a group of four registers. Among other things, this instruction allows a page table group, PTG, to be loaded using a single burst memory access, or if found in the cache one cache access time.

Added a hash table accelerator instruction. The PTENDX instruction locates a PTE in a quad of registers with a matching virtual address. It works like the other indexing instructions and returns the index of the PTE or -1 if not found in the registers.

Code:
# Incoming:
# a0 = virtual address
# a1 = hash table base address
# a2 = 0
   PTGHASH t1,a0   # get hash of virtual address
   SLL t1,t1,6      # turn hash into group index, 8*8
   LDI t6,0      # t6 = miss count
.again:
   MULF t7,t6,t6,a1   # square miss count, fast multiply and add base table address
   LDHQ   t2,[t1+t7]   # get the group into t2 to t5
   PTENDX t1,a0,t2   # search for PTE
   BGE t1,r0,.found   # exit loop if found
   ADD t6,t6,1      # increment miss count
   BBC t6,4,.again   # if fewer than 16 tries, repeat
   < page fault code >

_________________
Robert Finch http://www.finitron.ca


Sun Apr 03, 2022 3:25 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Fixes: the 2R form of the subtract instruction was not implemented, but was used by the assembler for the NEG instruction. This led to sprites not bouncing at the edge of the display screen as the dx, dy movement components were not negated.

Removed the PTENDX instruction. For some reason it causes the core to hang during startup. I suspect a bad route in the FPGA as I cannot see any reason why adding this relatively simple module would affect anything.

Spent some time playing around with the DSD design. Had a look at the micro-ops for the superscalar 6502. Trying to see if something similar could be done for a 68000 processor. Also coded a xmodem receiver. The hope is to use it to load software rather than rebuilding the system all the time.

_________________
Robert Finch http://www.finitron.ca


Mon Apr 04, 2022 4:01 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Spent most of the day converting the in-order core to an out-of-order core. First renamed the core, suffixing it with an ‘oo’ to indicate the out-of-order version.

_________________
Robert Finch http://www.finitron.ca


Tue Apr 05, 2022 4:19 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Finally got things running well enough in simulation to try synthesizing the design. I was sure it would blow the LUT budget given the amount of additional bypassing logic. But it turns out it cost only about 5,000 additional LUTs. Much better than I expected.

It is interesting because the reorder buffer is essentially randomly ordered. Ordering is determined by supplying a sequence number with each instruction. Instruction fetch places an instruction in the buffer wherever it can find an empty entry. Decode then searches the buffer for fetched instructions and decodes them. Decoding can occur in any order. Once instructions are decoded and arguments are valid they may be executed. Instructions may execute in any order, excepting memory instructions for which strict ordering is applied. Once instructions are executed, the buffer is searched for executed instructions that are next in sequence after the just previously executed instruction then those instructions are copied to the register file and the instruction marked retired. This is how order is maintained.

Essentially each stage searches for entries in the buffer in the appropriate state.

One complication is instruction prefixes. Since instructions and prefixes may be placed into the buffer in any position, for a prefix to work the buffer must be searched for prefixes when the instruction is executed. Prefixes can be retired only when the corresponding instruction is retired.

One issue is that simulation becomes doggedly slow after about 300 us. That is not far enough into the boot code to see LEDs activated in sim.

_________________
Robert Finch http://www.finitron.ca


Wed Apr 06, 2022 4:02 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 38, 39, 40, 41, 42, 43, 44 ... 52  Next

Who is online

Users browsing this forum: Bytespider and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software