Last visit was: Sun Sep 19, 2021 10:31 am
It is currently Sun Sep 19, 2021 10:31 am



 [ 507 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 34  Next
 Thor Core / FT64 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
My computer is acting up. It locked-up big time, but came back after I put it to sleep and woke it up again. It's about four years old and gets used for hours every day.
So. I made some backups of recent work.
Thor is able to display a menu now with interrupts running in the background, but the menu options don't work. I'm trying to get the debugger to run from a menu option. I found a number of software bugs and a hardware problem with the system. The system wasn't wired up to read from the text controller memory and I relied on reading the screen to get characters.

_________________
Robert Finch http://www.finitron.ca


Wed Dec 30, 2015 11:15 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
Thor seems to work well enough that I've given it a bigger piece of meat to chew on. The FMT kernel which is written in C64.

I've been fixing up the C64 language compiler which was just about three years out of date compared to the Thor processing core. It sure is a lot easier to work in C rather than assembly. And I ported the FMT Kernel from FISA64 over to the Thor test system. It'll be a minor miricle if works. I had to give up the triple redundant ROM in the process as there aren't enough BRAMs to support it with the additional compiled code. The code is about 96kB now.

I used the following construct in the code &videobuf[nn] when videobuf was really a two dimensional array videbuf[51][4096], well the compiler failed on this. The problem is that it has to assume that you really want the construct &videbuf[nn][0] where the second array index is an implied value of zero. The compiler has been modified so that this works now.

Right now the system fails during startup trying to lock a semaphore, I can tell this from the status LEDs. There's a problem somewhere with the reservation load/store. So it doesn't get too far. But at least it's running compiled C code now.

_________________
Robert Finch http://www.finitron.ca


Mon Jan 04, 2016 10:26 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
I've been contemplating modifying the I-Cache so that it includes error detection and correction logic. I put ECC logic on the output of the system ROM since it could no longer be TMR. At least there is single bit error correction. But now I'm thinking it would be better if the ECC logic were at the output of the I-Cache so that any errors that crrep into the I-Cache could be corrected. The overhead for SECDED isn't that great.
I had some fun calculating ECC syndromes. I copied a syndrome generator code from an article I found on the web. It does some fancy bit fiddling for performance and apparently doesn't work. I ended up writing my own syndrome generator which is slow but seems to work.
The I-Cache would be partitioned into 32 bit wide chunks with 7 additional check bits for each chunk. This is to match what is stored in the system ROM.
See the diagram.
Attachment:
ECCCache.gif


You do not have the required permissions to view the files attached to this post.

_________________
Robert Finch http://www.finitron.ca


Thu Jan 07, 2016 11:14 am WWW
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 184
Hi Rob. Two questions regarding the error detection and correction. Timing-wise, is there overlap between the detection and correction? If a correction is necessary, does that trigger a wait state?

Also, I'm confused by the references to the FMT kernel and the C64 language compiler. (Commodore 64? Really?) Can you supply some elaboration or some links, please ?

_________________
http://LaughtonElectronics.com


Thu Jan 07, 2016 1:08 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
Quote:
Hi Rob. Two questions regarding the error detection and correction. Timing-wise, is there overlap between the detection and correction? If a correction is necessary, does that trigger a wait state?

In this case detection and correction are being done at the same time without any FF's in between the input and output. It's combinational logic applied to the input that generates the output directly, so I guess there is an overlap. The ECC is very fast, into the hundreds of MHz (I'm using an ECC core supplied by the FPGA vendor). Since the core is only running <50 MHz due to other constraints the ECC wasn't pipelined or registered. There are no waits states for the ECC.
I did put together my own ECC core in about 1/2 hour, but decided it wasn't worth the effort to debug when there was one available from the FPGA vendor. the vendor being Xilinx and using the free part of the Vivado toolset.

Quote:
Also, I'm confused by the references to the FMT kernel and the C64 language compiler. (Commodore 64? Really?) Can you supply some elaboration or some links, please ?

C64 really has nothing to do with the Commodore 64. It's just a name I picked for the 64 bit 'C' compiler so I wouldn't confuse it with another compiler I was working on. Documentation can be found here:
http://github.com/robfinch/Cores/blob/master/software/C64/doc

FMTK for FISA64 was mentioned in another post: http://anycpu.org/forum/viewtopic.php?f=9&t=125 (FMTK stands for Finitron Multi-tasking Kernel). It's currently being adapted to the Thor system.

_________________
Robert Finch http://www.finitron.ca


Thu Jan 07, 2016 10:40 pm WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
Most of the time the last few days has been spent fiddling around to get the compiled code to work. Error correction has been added to the system in several places.
The most recent change to the core was to support the check (CHK) instruction which causes an exception if the check bounds are exceeded.
The check instruction was being used in FMTK hence the reason for adding it to the cpu. However chk instruction use has subsequently been removed from FMTK.
A compiler fix for arrays toasted handling of string constants. So both array handling and strings had to be fixed.
A problem with the 'catch' statement was found and fixed.

If there are less than four instructions in a row to be skipped as part of a conditional progam flow then the compiler outputs predicated instructions. I had to search quite a bit to find a good example. This can be seen in the messaging code for FMTK as an example. In several places the compiler generates a sequence of predicated code rather than a branch.
Code:
FMTKmsg_147:
            ldi     r3,#0
            zxc     r3,r3
            sc      r3,2118[r16]
            ldi     r3,#-1
            zxc     r3,r3
            sc      r3,2116[r16]
            ldi     r3,#-1
            zxc     r3,r3
            sc      r3,2114[r16]
            tst     p1,r15
   p1.ne   lw      r3,2120[r16]
   p1.ne   sw      r3,[r15]
FMTKmsg_149:
            tst     p1,r14
   p1.ne   lw      r3,2128[r16]
   p1.ne   sw      r3,[r14]
FMTKmsg_151:
            tst     p1,r13
   p1.ne   lw      r3,2136[r16]
   p1.ne   sw      r3,[r13]
FMTKmsg_153:
            ldi     r1,#0
            br      FMTKmsg_126
FMTKmsg_135:
            tst     p0,r15
   p0.ne   lw      r3,-16[bp]
   p0.ne   lw      r4,8[r3]
   p0.ne   sw      r4,[r15]
FMTKmsg_155:
            tst     p0,r14
   p0.ne   lw      r3,-16[bp]
   p0.ne   lw      r4,16[r3]
   p0.ne   sw      r4,[r14]
FMTKmsg_157:
            tst     p0,r13
   p0.ne   lw      r3,-16[bp]
   p0.ne   lw      r4,24[r3]
   p0.ne   sw      r4,[r13]
FMTKmsg_159:
            addui   sp,sp,#-16
            ldi     r3,#-1
            sw      r3,8[sp]
            ldi     r3,#sys_sema_
            sw      r3,0[sp]
            jsr     c1,[c9]

The compiler doesn't actually output predicated code that often because the number of instructions to be skipped is small (<4).
This has to do with the limited size of the instruction queue. Four instructions is one-half of the queue size. If the queue were larger it would make sense for longer sequences of instructions to be predicated.

_________________
Robert Finch http://www.finitron.ca


Sun Jan 10, 2016 5:46 am WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 205
Location: Huntsville, AL
Rob:

I read your posts on the Thor core with interest. Apparently I lack the background to understand what you mean by predicated instructions. Would you discuss the concept here, or post a link to a reference on the subject of predicated instructions?

_________________
Michael A.


Sun Jan 10, 2016 2:21 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1632
Perhaps I can jump in? It's a technique to avoid short branches by making instructions themselves conditional - they either act or they don't, according to the relevant condition. For hopping over one or two instructions, it's like changing them to NOPs, and can be better than branching. See
https://en.wikipedia.org/wiki/Branch_predication


Sun Jan 10, 2016 2:28 pm

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 205
Location: Huntsville, AL
Thanks. That clears up the subject, but it certainly complicates the implementation of the pipeline.

I guess the adage "there is no free lunch" applies especially when trying to run a slalom track with a drag racer.

_________________
Michael A.


Sun Jan 10, 2016 3:43 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
Quote:
Thanks. That clears up the subject, but it certainly complicates the implementation of the pipeline.

Quote:
Perhaps I can jump in? It's a technique to avoid short branches by making instructions themselves conditional - they either act or they don't, according to the relevant condition. For hopping over one or two instructions, it's like changing them to NOPs, and can be better than branching

Yes, a good way to think of it is as if the instruction where the predicate failed gets turned into a NOP. However it's more complex than that. For instance long running operations like divide or multiply have to be aborted because they start processing at the same time the predicate is evaluated. Single cycle operations are actually performed but the target register isn't updated.
Also, there's a bit of a trick to predication which is using the target register as an extra hidden argument to the instruction. It's as If the op works like
Code:
if (predicate okay)
 T = A op B
else
 T = T orig

The instruction can't just be turned into NOP's because the source of register information for subsequent instructions may come from that pipeline queue. there are potentially queued instructions waiting for the value of a target register from the queue entry which may have a failed predicate. So the most recent value of the target register may need to be supplied somehow. I setup the instruction to supply the most recent value of the target register as an extra operand. It therefore uses the same forwarding logic as the other operands. This decreases the performance of the core slightly over not needing the target register as an operand.

Predication is more effective when the instruction queue is large. For Thor the instruction queue is really too short (right now) to make much use of predication. Looking at the assembler code it's almost exclusively the branch instructions that are predicated. Predication works better when an entire basic block is queued and can be predicated.

_________________
Robert Finch http://www.finitron.ca


Fri Jan 15, 2016 8:34 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
The most recent change to the core was stack pointer register aliasing.
r27 now refers to one of r27, r28, r29, or r30 depending on the operating mode of the core.
For instance a reference to r27 in debug mode will actually re-route to r30. This allows
the same code to be reused in different operating modes.
r28 is the interrupt stack pointer
r29 is the exception processing stack pointer
r30 is the debug mode stack pointer

_________________
Robert Finch http://www.finitron.ca


Fri Jan 15, 2016 8:42 am WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 205
Location: Huntsville, AL
Rob:

Thanks for taking the time to lay out the details of the predication logic you are using in the core.

Did you give any thought to creating multiple stack pointer registers hidden behind each other and selected by the processor's current mode? This is the approach that I've taken with my core to supply kernel and user mode stack pointers; it follows the example provided by the PDP11 processors. I do have to provide a way to access each user mode stack pointer from the kernel mode, and to prevent access to the kernel mode stack pointer while operating in the user mode.

I may be missing something, but your register renaming/mapping approach may not prevent the manipulation of the stack pointers for the other modes from being manipulated. Is protection of the stack pointers from unauthorized/unintended access a feature that you are including in the core?

_________________
Michael A.


Fri Jan 15, 2016 11:39 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
Quote:
Did you give any thought to creating multiple stack pointer registers hidden behind each other and selected by the processor's current mode? This is the approach that I've taken with my core to supply kernel and user mode stack pointers; it follows the example provided by the PDP11 processors

Yes, that's how I'd like to set it up. But it's tricky because each register needs it's own unique id so forwarding works correctly. The registers are simply mapped to a larger register code. The mode has to be incorporated into the register id somehow, so I just did an add. Right now I use a seven bit register id and all the registers are mapped to a unique id. I've run out of room in the FPGA so I may not be able to implement things the way I'd like. Currently I add the mode number to register r27 to get the stack pointer for the mode.
There is also the problem of the stack segment which should also be changed depending on the mode.

Quote:
I may be missing something, but your register renaming/mapping approach may not prevent the manipulation of the stack pointers for the other modes from being manipulated. Is protection of the stack pointers from unauthorized/unintended access a feature that you are including in the core?

Not sure what the docs say (they may be a little dated). Registers 28 to 31 were only available to kernel mode, an attempt to access the register in non-kernel mode generates a privilege exception. But there is currently a problem as the user stack pointer isn't available to the kernel mode program yet.

There are also multiple return address registers depending on the processor mode. These have unique code address register numbers.

I have mapped the registers to a seven bit code as follows:
00-63 = r0 to r63 of the general register file
64-79 = predicate registers p0 to p15 (16)
80-95 = code address registers (c0 to c15)
96-111 = segment registers (8 base + 8 limit)
112-127 = special registers (loop counter, tick, etc.)

This really needs an eight (or nine) bit code. 128 id's isn't quite enough. IF register renaming of the target registers were done it might need a nine-bit code.

_________________
Robert Finch http://www.finitron.ca


Sat Jan 16, 2016 9:09 am WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 205
Location: Huntsville, AL
Rob:

Thanks for the explanation. With respect to register IDs, you're operating at a higher of level of sophistication than I am currently using: a 6502-based core doesn't require much sophistication in regard to register IDs. :)

_________________
Michael A.


Sat Jan 16, 2016 12:27 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 1484
Location: Canada
Sometimes I think Thor's too complex for me. There's currently a bug I can't seem to find. I put debug status LED displays in the software, and the location of the problem changes depending on the presence of the status display. Thor executes a break instruction which fills the screen with zeros (a dump of the address). I think it's maybe a bad DRAM interface causing a return to address zero. I guess I need to hammer test the interface.

It's occurred to me that the register id's may be reused for different stack pointers as the pipeline is flushed during a mode change so there shouldn't be a need to worry about multiple stack pointers in the instruction queue needing different id's.

I've been trying to figure out how renaming target registers works exactly to the point that I can write the rename logic. I've written some of the logic for it but it isn't complete.

I found the BOOM processor project on the web. https://github.com/ucb-bar/riscv-boom which I started to study. It's an Out-of-order RISCV variant. RISCV has a number of features that make an OOO machine more manageable to implement.

_________________
Robert Finch http://www.finitron.ca


Mon Jan 18, 2016 12:48 am WWW
 [ 507 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 34  Next

Who is online

Users browsing this forum: CCBot and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software