View unanswered posts | View active topics It is currently Thu Mar 28, 2024 3:50 pm



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 29, 30, 31, 32, 33, 34, 35 ... 52  Next
 Thor Core / FT64 
Author Message
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
robfinch wrote:
Added precision specifications for types.
Actually, that's something that I always missed from C. Being a relatively lower level language it makes sense that the size specification was a native thing, and then derive the size of ints from that depending on the architecture, rather than the opposite.

robfinch wrote:
The compiler was modified to support the ptrdif instruction ... the LEA instruction sets the upper bits of the target register to indicate a pointer is present.
The keyword ‘nullptr’ was added.
If I recall well, the 64 bit version of the cocoa api/framework also interprets pointers as having some sort of information in their upper bits. I think in this case the upper bits contain an index to the class type table, which is used for optimisation reasons when dealing with class inheritance and virtual functions, but I must be wrong on this particular aspect. Are you using these upper bits in any different way?


Mon May 13, 2019 8:39 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The color graphing register allocator is very close to working. I ran it and compared the results to non-graphing. Other than different register numbers the graphed code looks identical to the non-graphed code. Either I’ve coded something incorrectly, or it makes very little difference to this machine architecture.

I had anticipated that it would make a difference for some of the larger functions like printf().
This has made me re-think my desire for a 64-register machine. If it only makes a difference once in 10,000 lines of code, is the extra hardware really worth it?
To generate code run through the grapher the compile option is -rv.

_________________
Robert Finch http://www.finitron.ca


Tue May 14, 2019 4:47 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Are you using these upper bits in any different way?
Tentatively, the upper 20 bits of a pointer are the same as a floating-point nan value so that when the garbage collector is scanning through memory it can distinguish between floats and pointers. Different nan values are being used for pointers and floats. The nan value $FFF01xxxxxxxxxxx means it's a pointer, the low order 44 bits are the actual memory address. A second nan value $FFF02xxxxxxxxxxx is being used to command the scanner to skip over bytes that would cause challenges such as string tables. There's a handful of nan values that could be used to help out with things like garbage collection. Bit 51 of the nan must be zero for pointer usage and one for a real nan. Of course the floating-point unit is made to go along with this scheme.

_________________
Robert Finch http://www.finitron.ca


Tue May 14, 2019 5:04 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Compiler: in several places a set of bits contained in a 64-bit integer was used to record which registers are stacked. This was okay before color graphing and a virtual register set of 1024 registers. The 64-bit integer was changed to a CSet object. The CSet object is a bitfield set. The CSet object can handle arbitrarily sized bit arrays.

The compiler had restoration of the floating-point registers in the wrong order. They should have been restored prior to restoring general purpose registers. This would lead to values being loaded into the wrong registers.

In the GrClearScreen() routine (and several others) a register variable got removed that shouldn’t have been. This led to a store to an invalid address. The variable was used but its initialization was removed.

The reduction or instruction was scrapped. It turned out not to be used once logical branches were available. The same operation can be performed by using two not instructions in succession. Interestingly there are still spots in the compiler that generate the equivalent operation but in tens of thousands of lines of code that hasn’t been needed.

The instruction set is altered yet again. It’s moving back towards having separate major opcodes for all the load / store instructions. The issue is that unaligned addresses can’t be formed with the current encoding. Instructions were encoded in that way because there weren’t enough opcodes to represent all the instructions. Well I found a couple of extra opcodes, so they are begin put to use for possible unaligned loads / stores. While unaligned loads / stores aren't currently supported, it should still be possible to generate such an address. It can then be handled by an alignment fault handler.

_________________
Robert Finch http://www.finitron.ca


Wed May 15, 2019 3:58 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
When re-arranging instructions I toasted the float branches. Accidently used that opcode for another purpose. So, more rearranging of instructions was required. Opcodes for the logical branches were moved around, and the logical branch opcode used for float branches.

Work has started on the next version of the processor. It’ll be basically the same as version 7 except that a bit is being stolen from the immediate field to allow more opcodes at the root level. This will allow twice as many opcodes for loads and stores, which will allow for unaligned loads and stores. There shouldn’t be much impact on code as the immediate field is still large enough to accommodate most small constants. Doing this change increased the code size by about 2% due to constants that couldn’t be encoded with one less bit.

_________________
Robert Finch http://www.finitron.ca


Thu May 16, 2019 4:04 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Working on the I-cache today as it seems there is an issue that occurs during instruction fetch. The PC inexplicably goes to a faraway place.
I made a copy of the instruction cache code and modified it to use a 4x clock for transfers between L1 and L2 caches. The 4x clock transfers using a 64-bit bus instead of a 306-bit bus. The idea was to try and reduce the number of signals. I then shelved the 4x clock i-cache as it didn’t make much difference to the build.

Having rebuilt everything for version 8 to make use of an extra opcode bit, things aren’t working perfectly yet. The system runs as far as placing three ‘A’s on screen. Which is almost at reset. Not planning anymore work on version 7.

_________________
Robert Finch http://www.finitron.ca


Fri May 17, 2019 3:30 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
There was a one cycle overlap in time between when an instruction cache miss is processed and when the write buffer writes to memory. The extra cycle was introduced when the icache module was separated out from the mainline code. I’m not sure this caused the problem, but I put a fix in to hold off the store until the instruction cache load is complete. At the last system build, the system gets hung on an invalid data address.

Latest compiler fix:
Bitfields were always being treated as signed values. This was an issue with declaration processing.

The assign-add operator on a bitfield generates an extra copy of dereferencing code. It dereferences the bitfield twice into two different registers, then correctly performs the assign-add. This is a performance issue. It’s not simple to fix. There’s a flag that gets passed around indicating a bitfield assignment. I believe this flag needs to have the correct value at precisely the right point in the code.

Been working on the memory management side of things. Allocating memory in the system always returns a pointer to a higher and higher virtual address, even if memory is freed. Freeing memory puts the physical memory back into the available pool but doesn’t affect the virtual allocation address. The idea is that in a system with a 44-bit virtual address space, it probably doesn’t matter. The system won’t run out of virtual space before it’s shutdown. Memory allocations are also spaced out by a virtual page. The unused virtual page is a placeholder for the end of the previous memory allocation. So, if ten pages are allocated, the next virtual address to be used starts with the twelfth page not the eleventh.
I’m wondering how to implement shared memory within an inverted page table system. With an inverted page table there is only a single page table, rather than there being a page table for each application. The issue then is that there can only be single virtual page mapped to a physical one. The only thing I can think of to do is to reserve a range of addresses in the virtual memory space for shared memory. That way each program using the shared memory gets the same virtual address to the memory. The next issue would be how much of the virtual address space to reserve for shared memory?
I decided on using the top nybble (bits 40 to 43) of an address to represent a shared memory page. All shared memory pages are allocated in the virtual address space beginning at $FFF01F0000000000. Shared memory must use a unique memory protection key that identifies the shared memory. Requesting a shared memory page searches for a virtual address pointing to the pages with the memory protection key matching the one for the shared memory. Allocating / deallocating shared memory is quite slow as the allocator must search the entire page table (65536 entries) to find the shared memory pages.

_________________
Robert Finch http://www.finitron.ca


Sat May 18, 2019 2:41 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Compiler fixes: initializing the register stack for switch expressions. The register stack was not being initialized, which may result in the compiler running out of temporary registers and having to swap them to memory. Switch expressions also were not releasing registers when finished with them (minor – no impacts). Found these issues accidently while testing table based switches.

Finally ran into an instruction that requires five operands – the bitfield insert instruction. The compiler only supports instructions with a maximum of four operands. Rather than re-write significant portions of the compiler for just one instruction, it’ll have to generate a bitfield insert as two separate instructions. (ToDo). Currently the compiler uses about four instructions to do a bitfield insert.

Added to version eight are branches using a register to hold the target address. This allows computed branch targets. Its only use by the compiler is for table based switches. Currently these branches are always treated as branch misses so they are slower than regular branches.

_________________
Robert Finch http://www.finitron.ca


Sun May 19, 2019 3:43 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I really want at least double-extended or even triple precision floating-point. I’m scratching my head over how to implement load and store operations for such a beast. Double-extended is 80 bits and triples are 96-bit. So, they will require two cycles to load or store across a 64-bit bus. They could also be aligned at any 16-bit address.
Another issue with the formats is that the register file is just a single register file although it looks like separate floating-point and integer registers it’s not implemented that way. So, the entire register file would have to be widened. It also means that the internal busses would have to be widened as well. It could make the core 40% larger.
For some reason it took the system about 20 hours to build. It had been building in about a half hour. The only thing changed on my system was windows got updated.

For version eight the Ra and Rt fields have been switched around. I defined some macros in the assembler to place the fields in the instruction.

_________________
Robert Finch http://www.finitron.ca


Mon May 20, 2019 5:56 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
A new thread for the rtfItanium core is here. The rtfItanium is a three-way superscalar core coded in Verilog or System Verilog. It's a 64-register, 80-bit data path machine. Project started in earnest about the 18th May, 2019.

_________________
Robert Finch http://www.finitron.ca


Last edited by BigEd on Wed May 22, 2019 8:18 am, edited 1 time in total.

helping re-thread the new project



Wed May 22, 2019 3:24 am
Profile WWW
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
robfinch wrote:
A new thread for the rtfItanium core is here. The rtfItanium is a three-way superscalar core coded in Verilog or System Verilog. It's a 64-register, 80-bit data path machine. Project started in earnest about the 18th May, 2019.

Hi Rob,
Does this mean that you are now going to work in both projects, maybe just sharing pieces of code or components between them?


Wed May 22, 2019 5:24 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Does this mean that you are now going to work in both projects,

I’ll have to see how well the rtfItanium goes. If I don’t scrap it within a few days, I’ll probably be switching projects. I’m always starting new projects but most of them end up not being worked on significantly. I’ve gotten kind of stuck on the FT64. Been working on it about two years. I periodically go back and revisit older projects, to try and get a sense of direction. I started about three projects the past couple of weeks and found myself revisiting the DSD9 core. DSD9 is an 80-bit core with 40-bit instructions. It is an overlapped pipelined design (non-superscalar).
Quote:
maybe just sharing pieces of code or components between them?

Typically, I reuse the code (and documentation) from older projects and improve on it as I learn new tricks. But I have a tendency not to go backwards.

A bare-bones version of the core is looking to be around 75,000 LUTs. Much smaller than I expected. I managed to trim about 15,000 LUTs off the design by not implementing an instruction fetch buffer, instead the output of the cache is used directly.
Spent considerable time getting the code to a usable point. Not ready for simulation yet and I have to come up with an assembler. The core will have most of the features of the FT64 but vectors are not supported initially.

_________________
Robert Finch http://www.finitron.ca


Fri May 24, 2019 6:28 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
(I think it would be great, Rob, if you could post an index thread outlining the various cores you've designed - even if not finished. I did have the impression there have been a few restarts within this thread. I have the tools to re-thread it into separate threads, if we can identify the starting point of each new adventure. But even if not, we can link to the relevant posts within this thread.)

(Edit: having said that, I don't want to make work for you or to reduce the enjoyment of making progress and posting about it. In any case it's great to read your telling of the story as it unfolds.)


Fri May 24, 2019 7:18 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
I think it would be great, Rob, if you could post an index thread outlining the various cores you've designed - even if not finished.

Yikes! This is embarrassing.

Some of the cores are downloads of the work of other people, that I've studied or modified, but it's mostly stuff I've come up with. This is only the root directories, there's usually a whole bunch more directories underneath.
It's an inventor's toolbox approach.

18? years of cores.
Attachment:
File comment: Directories Of Cores
tdir.txt [26.18 KiB]
Downloaded 240 times

_________________
Robert Finch http://www.finitron.ca


Sat May 25, 2019 2:55 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Double yikes!!


Sat May 25, 2019 7:45 am
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 29, 30, 31, 32, 33, 34, 35 ... 52  Next

Who is online

Users browsing this forum: AhrefsBot, SemrushBot and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software