AnyCPU - View topic - Thor Core / FT64

Page 25 of 52

[ 775 posts ]

Go to page Previous 1 ... 22, 23, 24, 25, 26, 27, 28 ... 52 Next

Thor Core / FT64

Author	Message
robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Having espoused all the benefits of the base and bounds system, I decided to shelve it for now. The issue I ran into was: what if data needs to be passed to another thread? As it is the system is based around the concept of a thread and it entirely encapsulates the data. It was not possible to pass just the data to another thread without passing a bunch of additional information that is not really relevant. Effectively, a descriptor contained too much information. Greater minds than myself have dealt with this issue already, so I figure it’s best to follow an existing system. So, I’ve gravitated back towards a classic segmented system except that the segments include both upper and lower bounds. There are eight segment registers because four was deemed as maybe not enough. FS and GS do get used to store thread local and global data in x86 systems. I had a friend tell me I was thinking too small with my designs. Added the PTRDIF instruction which is an ordinary subtract operation followed by a right-shift. The idea is to determine an index value from the difference between two pointers. I sketched out what would be required in order to perform a far call, a call to an address in a different code segment. Using a segment load exception, I estimate this would require about 300 clock cycles to perform. Not fast at all, but it should be possible to do. The SoC is busted back to a not-working state. I’m sure it’s something minor. _________________ Robert Finch http://www.finitron.ca
Tue Dec 11, 2018 4:06 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 I've been inspired to start working on version eight of FT64 after reading this thesis which details an x86 compatible core designed for an FPGA. I figure if they can hit 200MHz in a fast FPGA for something like an x86, it should be possible to hit 75-100MHz in a slow one for something more contemporary. FT64v7 suffers from the simplicity of it's design. The scheduling logic's atrocious and the number and types of registers leaves something to be desired. I've realized that effectively cams are used all over the place in FT64v7 and they don't map well to FPGA fabric. http://www.stuffedcow.net/files/henry-thesis-phd.pdf FT64v8 will have split register files associated with functional units. It will also have variable length instructions. Ft64v8 won't be a micro-op based design but hopefully some of the ideas from the thesis can be used. _________________ Robert Finch http://www.finitron.ca
Fri Dec 14, 2018 6:16 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1884	Re: Thor Core / FT64 Wow, that thesis is a really good find! Needs a thread of its own.
Fri Dec 14, 2018 7:57 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Quote: Wow, that thesis is a really good find! Needs a thread of its own. Posted by EricP on comp.arch along with a number of other goodies under the topic "Tomasulo Algorithm and reorder buffer for parallel vector engines" _________________ Robert Finch http://www.finitron.ca
Fri Dec 14, 2018 2:15 pm

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1884	Re: Thor Core / FT64 Thanks Rob! That's this post which is the first response (second post) in this discussion. I'm glad comp.arch is still healthy. I used to read it at work, and at its best I think it fits my interests nicely.
Sat Dec 15, 2018 8:47 am

MichaelM Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL	Re: Thor Core / FT64 Agree with Ed. That discussion, at least the part starting at the link and which I had time to read, cleared up quite a few cobwebs / misunderstandings of mine regarding the scoreboard vs Tomasulo/ROB and the implementation of precise exceptions in OOO machines. It's a subject I've been intending on studying for some time, and the first responder in that thread did an excellent job in describing the differences. Should make it easier to follow through on the study of the subject when time permits. _________________ Michael A.
Sat Dec 15, 2018 6:49 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Pseudo coded a microcode routine to load a segment register. Takes about 40-50 instructions. I was going to try micro-coding several routines, but then I realized there wouldn’t be any performance advantage and it’s more complex hardware wise to support micro-code in the processing core. Instead the routines can be coded in a small ROM. There’s a bit of a chicken and egg paradox when it comes to loading segment registers. Ideally loading the segment registers is done as an operating system routine. But there’s no way to get to the operating system without loading a segment register. The current solution is to designate a portion of the address space as in common to all segments, so that the segment in use doesn’t matter. An issue with segmentation is that the segment registers are really 256-bit entities by the time base, bounds and access rights are included. Move segment register to segment register requires a 256-bit wide bus in the processor. But it’s undesirable to have all the result busses in the core 256-bit just to support moving segments around. In the pseudo-code 64-bit accesses are used to manipulate parts of the segment, but this could be changed to 256-bit accesses. I’m wondering if it’d be worthwhile to allow loading of 256-bits at a time directly into a segment register. The load result bus would have to support 256-bit then. But I’ve defined an opcode for 256-bit loads anyway (lo for load octa-word) to support SIMD like instructions down the road. Code: ; MOV ES,D31 macro mov2es bit cc7,d31,#15 ; test local / global selector flag jsr LoadYs ; Is zero? ; We check only the acr word cmp cc7,d30,d0 beq cc7,.zeroSeg@ ; Is segment present? bpl cc7,segNotPresent ; Now check segment type shr d29,d30,#48+11 and d29,d29,#3 cmp cc7,d29,#2 ; check that YS is a data descriptor bne cc7,segtypeFault ; now check privileges mov d29,cs.acr shr d29,d29,#48 shr d30,d30,#48 and d29,d29,#$FF and D30,d30,#$FF cmp cc7,d29,d30 bgt cc7,priv_fault ; DPL must be >= CPL .zeroSeg@ mov es.base,ys.base mov es.lower,ys.lower mov es.upper,ys.upper mov es.acr,ys.acr rts endm LoadYs: and d31,d31,#$7fff bne cc7,.0001 ; load from global descriptor table ; into temporary segment register ys ld d30,gdt:[d3132] mov ys.base,d30 ld d30,gdt:8[d3132] mov ys.lower,d30 ld d30,gdt:16[d3132] mov ys.upper,d30 ld d30,gdt:24[d3132] mov ys.acr,d30 rts ; load from local descriptor table .0001: ld d30,ldt:[d3132] mov ys.base,d30 ld d30,ldt:8[d3132] mov ys.lower,d30 ld d30,ldt:16[d3132] mov ys.upper,d30 ld d30,ldt:24[d3132] mov ys.acr,d30 rts _________________ Robert Finch http://www.finitron.ca
Mon Dec 17, 2018 6:47 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1884	Re: Thor Core / FT64 For a bit of calibration, it seems the penalty for a context switch in a modern CPU and OS is at least tens of thousands of cycles and possibly hundreds of millions. Yikes. https://stackoverflow.com/questions/218 ... ext-switch (Probably the interesting case is not the cold-cache case with large working sets, but ping-ponging between two smallish processes.)
Mon Dec 17, 2018 7:08 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Context switch time is quite a large number of cycles. I expect at least into the tens of thousands for FT64 by the time memory management switches are taken into account. However thread switch time within the same context may be reasonably fast; hopefully only hundreds of cycles. I found out that for data cache loads byte lanes were being selected based on the instruction that caused a load. The core should have been selecting all the byte lanes for a cache load. I think the system worked only because the memory system ignored the byte lane selection signals for read accesses. In any case this has now been switched to read all byte lanes which is also slightly less logic. Found a bug in the instruction decoding where some component variables required for the decode were being delayed by a clock cycle and they shouldn’t have been. Running simulation with this bug ran into the same kind of lockup that occurs in the FPGA. So hopefully having this fixed will fix up the FPGA version. I found this bug by re-writing part of the core to merge several state variables into a single var called 'state'. Put some more work into version eight of the core. Rather than a compressed instruction set, version eight will simply use variable length instructions where the instruction length can be determined by the first byte of the instruction. This requires only a 3-bit wide 256 entry lookup table to determine the length. _________________ Robert Finch http://www.finitron.ca
Wed Dec 19, 2018 8:36 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Back in business. The core boots in the FPGA at least to the monitor. Improved the branch target buffer. The BTB now queues all flow control transfer info in a manner similar to the branch predictor. It uses a fifo capable of changing the rate of flow control operations down to match a single write port in the buffer. All flow control operations including branches and returns are now predicted by the BTB. The goal is to use the BTB as the first predictor for flow control transfers. Currently it is used in the fetch stage one clock after the current instruction is presented from the icache. This will be moved up to be current with the icache fetch. The branch predictor can’t be moved up because the branch instruction has to be decoded before it is known a branch is present. After sketching out how to get segmentation to work with hardware in v8 I decided to go back and add it to v7. Segmentation allows up to a 101 bit address range, although a lot fewer bits are actually implemented. Jumping to far code segments is handled differently that x86. An alternate segment register must be loaded with the target code segment first, then specified as a segment override to the jump or call instruction. This splits the operation up into multiple instructions, otherwise a single instruction would be too large. _________________ Robert Finch http://www.finitron.ca
Fri Dec 21, 2018 10:38 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Limited far calls to using one of four segment registers (ZS, ES, HS, or CS) since it doesn’t make sense to load a data segment (DS, SS) into the code segment, and that way only two bits are required to specify the segment register to use in the far call. Performing a far call is a two-step process of loading the target segment then using an override prefix with a call instruction. Code: mov2seg hs,#$001234 ; load the hs with the target segment call hs:some_function <…> other call hs:another_func The author looked at having this performed with a single instruction but that would make the instruction too large and an oddball size compared to the rest of the instruction set. For a call operation the selector for the current code segment is being stored in the upper 24 bits of the link register, so that a second register is not necessary to manage for calls and returns. This does limit code to a 40-bit address space for a single module. Merry Christmas! _________________ Robert Finch http://www.finitron.ca
Tue Dec 25, 2018 5:07 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1884	Re: Thor Core / FT64 Indeed, Merry Christmas and season's greetings!
Tue Dec 25, 2018 11:52 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Started working in earnest on FT64v8 going with a much simple implementation. v8 is a straightforward non-overlapped pipeline scalar core. One goal is reducing the number of lines of code that must be managed. The purpose of the core is as an I/O or control processor. _________________ Robert Finch http://www.finitron.ca
Sun Dec 30, 2018 6:10 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Deleted a bunch of old stuff off the disk drive. I had files for pcb’s from college circa 1986. Still working on v7/v8 core. _________________ Robert Finch http://www.finitron.ca
Tue Jan 01, 2019 11:30 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2505 Location: Canada	Re: Thor Core / FT64 Back to the old load/store quandry tonight. About a half dozen load double-word (64-bit) instructions were spec’d out for v8. The problem is if there are an equal number of load instructions for all the different potential load sizes with signed and unsigned versions, that’s a lot of instructions. Then the loads must be mirrored with store instructions on-top of that. The base address mode in use is s-i-b. Scaled-index-base which works out to a 48-bit instruction. Other instructions are subsets of the mode in order to increase code density. There are a couple of possibilities to reduce the number of instructions. For instance, a size prefix could be used with a basic instruction. This decreases code density but would allow rarely used operations to be performed. Another option requiring additional bits in the instruction is specifying a base register to use. Design choices to make. _________________ Robert Finch http://www.finitron.ca
Wed Jan 02, 2019 5:23 am

Page 25 of 52

[ 775 posts ]

Go to page Previous 1 ... 22, 23, 24, 25, 26, 27, 28 ... 52 Next

Thor Core / FT64

Who is online