View unanswered posts | View active topics It is currently Thu Apr 18, 2024 7:51 pm



Reply to topic  [ 775 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 52  Next
 Thor Core / FT64 
Author Message
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
robfinch wrote:
The list includes population and zero/one counting, divide, BCD math, bit-field operations and few others.
And variable-length instructions. Yet, you describe Thor as RISC-like?? I'm getting tempted to tease you about that, Rob! :D Awesome project, though.

_________________
http://LaughtonElectronics.com


Fri Dec 04, 2015 2:54 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
And variable-length instructions. Yet, you describe Thor as RISC-like?? I'm getting tempted to tease you about that, Rob! :D Awesome project, though.

Like most modern cpu's Thor is somewhere in-between a CISC and RISC. The core supports several string operations definitely classified as CISC operations. And with 190+ pages of instructions one would hardly call it a RISC machine. But there are no really complex instructions like double memory indirect operations, or auto-increment / auto decrement memory ops. And memory is accessed almost exclusively with load / store ops.

Last night's debugging session was a nightmare. I could not find an explanation for why a particular bit was clear all the time. After much head-scratching I called it a night.
This morning when I tried the same thing without making any changes, it worked. I'm coming to the conclusion that there's a bad memory bit in my host machine.

The register file has been switched back to the old paradigm of providing only four register read ports even though the two instructions to be queued might need six.
This reduces the size of the core significantly (2,000-4,000 LUTs) at the cost of not being able to enque two instruction at once on rare occasions. Most instructions only use a single read port so it was wasteful to support six read ports for only two instructions. I had re-coded the register file temporarily to get rid of some of the complexity for debugging purposes. The complexity of managing fewer read ports than the maximum required is worth it especially when there are even more instructions to enqueue.
I'm considering supporting only five read ports for the enqueue of three instructions, when I get around to it.

And, the instruction queue / re-order buffer really needs to be longer. The head and tail pointers collide all the time causing the machine to stall. This happens when the core
is waiting for a memory op which take several cycles.

_________________
Robert Finch http://www.finitron.ca


Sun Dec 06, 2015 5:55 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I can't seem to get the FPGA system to work past clearing the screen. It clears the screen then hangs. A start-up message is expected. The last address displayed on the LED's is 0x00001C. Which looks suspiciously like an instruction cache access to address 0. The result of a bad code pointer.
So I coded a jump instruction to a LED display routine to be placed at address zero. But no luck, it doesn't seem to execute. In simulation the core runs until it gets into the keyboard processing code. The LED output seems to work so I should be able to find out what line of code is causing the core to croak, by using indicator LEDs. But it'll take a while.
I have coded for a three-way superscalar now but it's too large to fit in the FPGA yet, I'll have to be satisfied with two-ways and 32 bits for now.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 08, 2015 7:53 am
Profile WWW
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
robfinch wrote:
I'm coming to the conclusion that there's a bad memory bit in my host machine.

robfinch wrote:
I can't seem to get the FPGA system to work past clearing the screen.

Might these be related? Maybe it's far-fetched -- I'm having trouble getting my head around it. You're comparing the simulation with the actual hardware, and they don't agree...

_________________
http://LaughtonElectronics.com


Thu Dec 10, 2015 2:47 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Might these be related? Maybe it's far-fetched -- I'm having trouble getting my head around it. You're comparing the simulation with the actual hardware, and they don't agree...

They could be. But I think the toolset uses checksums to ensure that it's done a proper build. I've found a number of bugs in the code because the hardware and simulator agreed, but they don't always.

I managed to "fix" most bit errors in my host machine by using triple redundant memories. It seems funny that it'd fix the host, but the simulator also has to simulate the triple redundant memory, so it works then.
I found a major flaw in the code. When there's an instruction that has a predicate that tells it not to execute, then the following instructions that have any dependencies on it have to be stomped on. I neglected to do have this done It's treated like a branch only sometimes there's not as many entries to stomp on. There is a way to handle the problem without stomping on instructions but the amount of logic required is too much. So the instructions are just stomped on. Now I wonder why bother with predicates. I also scrapped the push instructions (for now), they were giving me grief debugging.

Fixes have increase the size of the code over the starting point by over 20% (78,000 LC's). It's now getting too big again for the FPGA assuming there's more than just the core in the FPGA.

_________________
Robert Finch http://www.finitron.ca


Fri Dec 11, 2015 6:11 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Thor runs for a few more cycles now after a couple of fixes. He clears the screen and displays some characters, just not the right ones. The cursor position gets updated now. The register-to-register instructions weren't updating the target register. There's so few of that type of instruction used that things almost worked. I switched the predicate failures from acting like branches to just being the predicated instruction invalidated. It was a little more hardware to do this but if using predicates it's maybe better they don't work like branches. If the instruction predicate doesn't pass, then the current value of the target register is passed to the output rather than an updated one. To get the current value of the target register it has to passed like an argument to the instruction. Read: more hardware. Thor still can't get a key from the keyboard yet.

_________________
Robert Finch http://www.finitron.ca


Sat Dec 12, 2015 5:23 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
A good portion of time was spent yesterday working on Thor's interrupt system. I managed to get an interrupted string operation to work in simulation. As can be seen in the attached .PDF document the red circles show where the string operation started was interrupted, then restarted after an interrupt.
I managed to get some nice stack operations done that enqueued two micro-ops per instruction in order to implement the stack operation. Then I rebuilt the system and found out it was too big to fit into the FPGA. Next thing to do was to backtrack and remove stack ops again.
Attachment:
InterruptedStringOp.pdf [265.52 KiB]
Downloaded 485 times

_________________
Robert Finch http://www.finitron.ca


Sun Dec 13, 2015 11:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Thor has grown again. Thor now does segment bounds checking and has debug registers. Because of limitations in the segment bounds check the last 4kB of memory are not accessible to the core. The reset address changed as a result. It had to be moved out of the last 4K of memory. Hopefully it'll soon be possible to make use of the debug registers to help resolve software problems. The core still doesn't get much past clearing the screen. I've yet to find the problem with the character output. I may put together some serial port routines and add a serial port for debugging.

_________________
Robert Finch http://www.finitron.ca


Mon Dec 14, 2015 12:11 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Some success. Thor's displaying a start-up message now. I had to re-write the wonderful circular logic that depended on a race condition settling into a tree structure. The original code was more elegantly done but just would not work with the toolset / FPGA. The changed code is somewhat larger as it uses a more brute-force approach but it seems to work. The system can now also process a single keystroke. Getting the second keystroke causes it to hang, so there are still problems. I also added a serial port to the system but haven't got it to work properly yet. Further little bits at a time.

_________________
Robert Finch http://www.finitron.ca


Sat Dec 19, 2015 12:02 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Still working on Thor. One step forwards, two steps backwards. I figured out the keystroke problem, it was just a software key-up/key-down indicator flag that needed to be reset. So the test system can now get multiple keystrokes. But I busted the start-up message somewhere along the way. For some reason the system insists on displaying an incrementing set of characters rather than the character it should be. I think it's maybe in the following piece of code where the cursor position increments. It could be executing the cursor increment before the store (out of order) then using the wrong value of r1 for the store. Or it may be storing r3 to memory while using it as an index. Anyways some more debugging to do.
Code:
      bsr      VBAsciiToScreen
      lhu      r2,NormAttr
      andi   r2,r2,#-1024
      or      r1,r1,r2
      lcu      r3,VideoPos
      lhu      r2,Vidptr
      sh      r1,hs:[r2+r3*4]  ; <= r3 stored by mistake ?
      lcu      r1,CursorX
      addui     r1,r1,#1            ; <= or cursor increment out of order problem
      lcu      r2,Textcols
      cmp      p0,r1,r2

I still haven't managed to get any sort of interrupts working. They seem to work in simulation but not in real life.

_________________
Robert Finch http://www.finitron.ca


Mon Dec 21, 2015 8:52 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
After about a dozen software bugs and two or three hardware bug fixes, Thor is able to display a disassembly dump on the debug screen. The I/O routines of the debugger are simpler than the regular ones and so easier to work with. I'm hoping to be able to get single stepping working. As it's setup at the moment single stepping is going to work by queuing a debug interrupt after each instruction queued. The system call instruction seems to work so interrupts, which are almost the same, must be close to working.
It's just about time to think about writing a software emulator for the system. It'll most likely be a modification of the one for FISA64. I'm finally taking a break for Christmas.

_________________
Robert Finch http://www.finitron.ca


Thu Dec 24, 2015 8:56 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1782
Progress! Have a good break.


Thu Dec 24, 2015 9:57 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Output from Thor ! The photo shows the alphabet test and debug window output. Still not getting the proper start-up message yet. It should over-write the alphabet test.
Attachment:
Thor1Cmp.JPG
Thor1Cmp.JPG [ 21.29 KiB | Viewed 10967 times ]


Memory indirect jumps have been added to Thor. A memory indirect jump can typically replace four or five other instructions that would otherwise be required to load an address from a table in memory and jump to it.
Memory indirect jumps are used to implement dispatch tables. The following code shows a sample usage:

Old BIOS code:
Code:
      ldi      r10,#VideoBIOS_FuncTable
      lcu     r10,cs:[r10+r6*2]
      ori     r10,r10,#VideoBIOSCall & 0xFFFFFFFFFFFF0000    ; recover high order bits
      mtspr   c2,r10
      jsr     [c2]

Also eliminated from the BIOS code where two more instructions required to stack register r10. The stack footprint of the routine was also reduced. Hence the single jci instruction replaced seven other instructions.

Newer BIOS code:
Code:
      jci      c1,cs:VideoBIOS_FuncTable[r6]

_________________
Robert Finch http://www.finitron.ca


Sat Dec 26, 2015 9:50 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
After a couple of more hardware fixes the core is starting to work much better. The system now displays the start-up message again. The fixes were to: predicate registers weren't being properly bypassed at enqueue, and fetcbuf0_mem classified the instruction as a memory op incorrectly. I can't make too many more hardware fixes because there's less than 1,000 LUTs left available in the device. Still trying to get hardware interrupts working.

_________________
Robert Finch http://www.finitron.ca


Mon Dec 28, 2015 1:32 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The IRQ problem has finally been discovered. I was looking for a complicated problem and it was really simple.
The irq line was effectively tied permanently active (it was connected to an external button). I found the statement by accident when I went to re-arrange code.
So IRQ's were happening continuously causing the system to effectively hang when IRQ enabled. I may put in an IRQ enable countdown delay so successive IRQ's can't hog 100% of the processing time. Any decent processor would have this. Many processor's allow at least a single instruction to be executed before interrupts are re-enabled. This allows processing to creep along even if an IRQ line is stuck active.

The RTS instruction with the stack pointer update has been eliminated and the single byte RTS instruction as well.
Because of the question: which stack pointer register to update ? Sure the register to update could be specified in the instruction, but then it turns the instruction into a four-byter. The instruction was also using an extra dedicated adder and muxing resources in the ALU.
It's probably just as fast to use an ADD instruction in performance terms. It's also easy enough to specify the stack pointer update using an ADD.
Eliminating the RTS and a couple of more fixes reduced the size of the core this time by almost 3,000 LUTs.

The weather wasn't that good yesterday so I was stuck inside.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 29, 2015 9:45 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1, 2, 3, 4, 5 ... 52  Next

Who is online

Users browsing this forum: SemrushBot and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software