View unanswered posts | View active topics It is currently Tue Apr 23, 2024 11:36 pm



Reply to topic  [ 159 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9, 10, 11  Next
 ANY-1 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Moved the execute stage back out of the mainline. Two steps forward, one step backwards. It turns out having it in the mainline caused the core to be much larger and it would no longer fit in the FPGA. Not sure how it affected synthesis, but it was 50,000LUTs larger. So, a slightly slower design fits.

Toying with the idea of allowing loads to use nybble aligned addresses.

With a little bit of work, the core can be configured for other sizes such as an 80-bit core. 80-bits is enough room to allow three 13.13 fixed point values in a register. Or three FP24 values.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 25, 2021 9:00 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Squeezed another branch displacement bit out of the instruction by noting that it does not make much sense to allow vector registers in the compare and branch instructions. Hence the bit used to distinguish vector and scalar registers could be re-purposed as a branch displacement bit. That gives 16 bits for displacement for a range of ±144kB.

Worked on the graphics accelerator today. Changed the bus master port from 64 to 128 bits and added some additional color depths.

_________________
Robert Finch http://www.finitron.ca


Sat Jun 26, 2021 3:16 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Results: The core can be seen in ILA sequencing through instructions, but it is not outputting the LED I/O address. It seems to be treating instructions as if they were NOPs. It does seems to be fetching the correct instructions indicating that the I$ is likely working.
Attachment:
File comment: No LED I/O Access
NoLED.png
NoLED.png [ 31.51 KiB | Viewed 804 times ]


Latest Fixes: The NMI input to the core was left open. Not sure if this was an issue. There have been issues in the past with open signals defaulting to active after synthesis.

Latest Mods: Decoupled queue and decode. Decode now takes place sometime after queue. This allows an exec instruction to be implemented.

_________________
Robert Finch http://www.finitron.ca


Sun Jun 27, 2021 5:39 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Mods:
Removed the stack-based call and return instructions. There was a bug in the call instruction taking some effort to identify. Code will now be slightly larger but will execute just as fast. Removed the exec instruction. The extra code that was piling up to support it made it not worth it.

Milestone: Got the execution run time up over 100 us in sim of the boot rom today.

_________________
Robert Finch http://www.finitron.ca


Mon Jun 28, 2021 4:06 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Mods:
Modified the branch instruction modifier to specify only two bits for the target link register. This keeps link register choice consistent with JAL and BAL instructions. It also allows bit to be re-purposed for more displacement bits. The modifier supports 19 additional bits now. The total number of branch displacement bits is now 35 with a modifier.
The arg B field of the queue was added to the bypass matrix to allow the fourth register of an instruction to be bypassed properly.

Latest Additions:
Added a watchdog timer to the instruction queue. If the queue’s execute pointer does not change for 512 cycles then an exception is generated.

Latest Bug Fixes:
The multiply fast MULF instruction was flagged as a multi-cycle operation it is single cycle. Multiply fast immediate was performing an add instead of a multiply.
In the assembler, populating the branch displacement fields was incorrect causing branches more than 64 nybbles to branch to the wrong address.

_________________
Robert Finch http://www.finitron.ca


Tue Jun 29, 2021 3:23 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Did a lot of experimentation with the EXEC and MYST instructions. Then finally wrapped them up in #SUPPORT directives and disabled them. EXEC executes any instruction contained in a register. It is not a very performance-oriented instruction as it may stall the processor while waiting to determine the instruction register. The EXEC instruction added about 3% to the size of the core. MYST is similar to EXEC except that the registers are encoded in the MYST instruction instead of coming from the register containing the instruction. This makes it somewhat faster than EXEC. Still not a good performer. Using JIT code would probably beat the use of EXEC or MYST.

_________________
Robert Finch http://www.finitron.ca


Thu Jul 01, 2021 3:40 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest additions:
LDM and STM, load and store multiple registers. These instructions are useful at function entry and exit and for context switches. The front end of the core had to be modified. They look like an ordinary load or store instructions except that a register list modifier is used with them. The modifier allows specifying x1 to x30 for the LDM / STM. Adding LDM / STM did not affect the size of the core very much. Surprisingly the core was about 500 LUTs smaller with the instructions added. If loading or storing three or more registers it is probably more efficient to use a LDM / STM.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 02, 2021 3:40 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Cannot get the core working in an FPGA. It seems to work beautifully in simulation. In the FPGA the proper instructions can be seen entering the pipeline thanks to ILA, but the core does not seem to execute the instructions. The STM instruction is one of the first instructions executed, present simply for testing in simulation.
There is a burst of activity seen as the STM instruction is processed in the pipeline, but there are no writes to memory occurring. If no writes make it to memory then no commits will come back and the core will hang. The first hack to try is widening the write pulse to two clock cycles which is only a single clock. It might be getting missed due to timing issues, but I doubt it.
I think the watchdog timeout is unsticking the core as there seems to be about 512 cycles between bursts of activity.

_________________
Robert Finch http://www.finitron.ca


Mon Jul 05, 2021 6:38 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
A picture is worth 1000 words.
After about 1000 ticks (500+ clock cycles) suspiciously looks like a watchdog event happening.
Attachment:
File comment: Activity burst in core
Burst512.png
Burst512.png [ 26.81 KiB | Viewed 750 times ]

_________________
Robert Finch http://www.finitron.ca


Mon Jul 05, 2021 6:50 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Sketched out a 20-bit compressed instruction format, then investigated what it would take to implement. Re-arranged some of the store instructions to make room for a 20-bit opcode space. One issue is the IP increment depends on the length if there are compressed instructions. The length decode is a 128-to-one, four wide multiplex and it would be cascaded into the IP increment logic. I suspect it may be slow to do so.

Debugging:
Doubling the width of the write pulse caused two back-to-back write cycles in simulation. This was expected to happen. But still no output from FPGA.

Put a vector in the execute stage to indicate which ‘if’s are being taken running on the FPGA. The vector is dumped in ILA so the progress is visible. So far it is indicating ‘4’ which is an unimplemented instruction or an exception of some sort.
Found that exception cause code 68h is present. This is not a cause code used in the system, it does however match the opcode for the instruction. So, it looks like an opcode is making it into the cause field. Something is amiss. I am just trying to create a different build of the system now to see if it’s build related or an issue with the FPGA.

_________________
Robert Finch http://www.finitron.ca


Tue Jul 06, 2021 3:28 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Additions:
Coded up a storm for 20-bit compressed instructions. Moved several pieces of the execute logic out to tasks that could be shared with 20-bit instructions. 20-bit instructions excepting branches all branch backwards by four nybbles, this is to get to the next instruction since the IP has already been incremented by nine. A backwards branch is about the least expensive means and also the lowest performance.
A full complement of shift / rotate instructions. Originally only left and right shift were supported. Now included are arithmetic shift right, and left and right rotate.

_________________
Robert Finch http://www.finitron.ca


Wed Jul 07, 2021 5:49 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Worked on getting the compiler up to speed. It had last been modified for RTF64 which uses condition registers. So, there were a lot of modifications required to compare and branch instructions throughout the compiler.

The compiler can accept an option passed to a switch statement to generate “naked” switches. A naked switch omits the range testing code. This is meant only for code that is known to work to boost performance. It must be guaranteed that values will not fall outside of the proper range, otherwise a naked switch would likely cause a crash.

Normal Switch:
Code:
  ;           switch(x) {
  ldo      $t0,64[$fp]
  sge      $t1,$t0,#1      ; x varies between 1 and 12
  sle      $t2,$t0,#12
  and      $t1,$t1,$t2
  beq      $t1,TestSwitch_89
  sub      $t0,$t0,#1
  sll      $t0,$t0,#4
  ldo      $t0,TestSwitch_116[$t0]
  jmp      $t0

Naked Switch
Code:
  ;           switch(x; naked) {
  ldo      $t0,64[$fp]
  sub      $t0,$t0,#1
  sll      $t0,$t0,#4
  ldo      $t0,TestSwitch_144[$t0]
  jmp      $t0

Code for switch generation was modified to use a binary search for the case value if there are more than two case values. Otherwise, a linear search is used. If case values are densely packed then a table lookup is used.

_________________
Robert Finch http://www.finitron.ca


Thu Jul 08, 2021 4:16 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Worked on the compiler some more. Changed code generation of branches for ANY1 from RTF64.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 09, 2021 4:45 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
There have been numerous fixes to the compiler.
The latest addition was support of designators in array initializations. Some of the code for that is pretty scary and little bit incomplete.
It is now possible to code designator as in:
Code:
void TestArray(int aa)
{
   int z[20] = {[5...13]=5,[0]=0,1,2,3,4,[14...19]=6};
}


Structures were being entered into the global symbol table instead of the tag table. This led to structure definitions not being found sometimes. The interesting thing is that the compiler’s search facility is so powerful that it would find the structure definitions most of the time anyway.

Trying to get the compiler to compile the following expression held me up for a while:
Code:
void (*_Atfuns[32])(void) = {0};

I think the goal is to initialize the first element of the array of pointers to zero. The CC64 compiler initializes the first element as specified then fills the remaining storage with zeros.

_________________
Robert Finch http://www.finitron.ca


Sat Jul 10, 2021 7:47 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Additions
Back in the instruction set are the LINK and UNLINK instructions used for subroutine linkage. They are not particularly fast but they are code dense. They expand out into a few of the more usual instructions.
Added an SLL optimization. If the target of the SLL operation is an index register and scaling can be used then the SLL instruction is removed, and scaling used.

Code fix pending:
The compiler’s forcefit() routine, which coerces types to the larger of the two input types, needs to be fixed up. I am not sure why, but the forcefit() routine connects the source node to the destination node. This would not work, as the source and destination need to be kept separate. I am not sure what I was thinking at the time it was originally coded. The reason the compiler works is that forcefit() was always followed by code to link the source and destination operands, which made the operand separate again.
So, there is extra dead code in the forcefit() routine that might be confusing to someone looking at the inner workings of the compiler.

_________________
Robert Finch http://www.finitron.ca


Mon Jul 12, 2021 5:34 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 159 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9, 10, 11  Next

Who is online

Users browsing this forum: No registered users and 14 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software