View unanswered posts | View active topics It is currently Thu Mar 28, 2024 9:47 am



Reply to topic  [ 121 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8, 9  Next
 RTF64 processor 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
That loop bubble bug is a nice bug to find and fix!


Tue Nov 17, 2020 9:48 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest fixes: hardware interrupt detection was moved to the last cycle of the instruction fetch to allow interrupts to be detected during pipeline loop mode. Rs0 needed to be selected as a source for the BEQI/BBS/BBC instructions, otherwise the value zero was used during the test causing most BEQI branches to fail. The MOV instruction was being decoded at the wrong location.

Reducing the number of clock cycles for register fetch and execute affected the clock cycle time and the build time due to additional signal congestion. The build time increased to about 3 hours and timing was almost but not quite met. So, the core has been switched back to using three cycles for register fetch and execute. The core could be entirely pipelined to achieve single clock cycle per instruction execution, excepting memory operations, but it would require about 20 pipeline stages and all the feedback paths would kill the timing due to signal congestion. Reducing the number of feedback paths and simplifying the execute logic reduces the size of the core.

Exception logic has been changed to defer processing of exceptions until the writeback stage. Previously exceptions were taken immediately in the stage in which they occurred. This increases performance, but also increases the size of the core, reducing the potential clock frequency.

The CHK instruction was modified to not exception if the flags record bit is set. Instead flags are recorded so that a true/false branch may be used. If flags are not being recorded then an exception is generated if the check fails.

_________________
Robert Finch http://www.finitron.ca


Thu Nov 19, 2020 5:07 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the ORGFX controller to the SoC and the router could not handle the additional logic at the requested frequency. After trying to route for about 12 hours it was stopped. The frequency was lowered to 40 MHz from 57 to see if that helps. The original plan was to run the video on a separate FPGA board from the cpu. But everything has just been thrown together to run in the NexysVideo until most of the bugs are ironed out.

The cpu core is too large. I want the thing to fit into a xc7A100t (or smaller) which is less expensive than the 200t. The cpu core with dram controller is about 80,000 LUTs. It needs to be 60,000LUTs or less. It may have to be implemented as a 32-bit core to get it to fit.

Loop mode is no longer entered when there are just two or fewer instructions in the loop. The reason being that it reduces the number of feedback paths in the pipeline. I could not think of a good example where there would be only two instructions including the branch, in a loop. The only thing that comes to mind is timing loops (a subtract and a branch) which are better done using one of the timing CSR’s as the cpu clock may vary or be stopped. A NOP can always be inserted to make the loop three or more instructions. I have been toying with the idea of reducing the number of feedback paths by limiting loop mode to an odd number of instructions for instance. As in 3, 5 or 7 instructions in the loop. The issue is it is a multiplexer for every feedback path, and routing of signals from one end of the pipeline to the other.

_________________
Robert Finch http://www.finitron.ca


Fri Nov 20, 2020 4:00 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Fixes: the cursor position was not being reset before use. This caused the screen to scroll up for every character displayed as the Y position was invalid. The immediate value for set instructions was not decoded properly causing them to fail. In a serious goof-up the read port for Rd was not implemented properly. This issue was partially masked because of result forwarding. It was reading using the writeback stage register id and it should have been the register fetch stage register id. The only way to get it to work was to add another block ram to create an additional read port for Rd. This did not matter in the non-overlapped pipeline version, it could read and write because there was only a single register id to deal with.

Thinking about having the branch instructions explicitly enable pipeline mode. There is enough room in the instruction encoding to reserve a bit for this, reducing the branch displacement by a bit. The branch displacement was 14 bits so reducing it to 13 should not impact anything. The issue is removing logic that detects when pipeline mode is possible and leaving it up to the compiler to decide. Had a case where pipeline mode is entered, and it should not be. It was a backwards branch to an instruction in the branch shadow of a forward branch. The instruction in the branch shadow was in the pipeline due to read-ahead although it was not a valid instruction to execute. The loop logic did not detect this situation. There are probably other complicated cases where automatic loop mode would fail. So back to oldben's idea of dedicated instruction bits.

The core clock frequency has been temporarily reduced to 40MHz to reduce build times for debugging. Otherwise it takes about four hours to hit the 57MHz target, it only takes 1.5 hours to hit 40MHz.

_________________
Robert Finch http://www.finitron.ca


Sat Nov 21, 2020 3:48 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Milestone: text string dumped to screen. Finally. Debugging made a whole lot easier now. Of course, by the time text strings are displayable most of the core is probably debugged. Excuse the crazy photo angle, there is not much room for a camera without rearranging the workspace. The first display column is supposed to be blanked out, the text controller needs a little work.
Attachment:
File comment: RTF64 First Text Output
TextOut2.jpg
TextOut2.jpg [ 260.89 KiB | Viewed 841 times ]

Latest Fixes: the results flags were always being set by memory loads when they should be conditional depending on the setting of the record bit in the instruction. Worse yet the loads set the results flags a cycle too late resulting in incorrect bypass behaviour. To fix this a test of the record bit was added and an additional state was added to set the flags if needed. The next issue was the need for a pipeline stall when flags are set by load instructions. The ADD5 instruction was adding the result register instead of Rs1.

_________________
Robert Finch http://www.finitron.ca


Sun Nov 22, 2020 3:57 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Another great milestone! And a monitor is a very tangible first application


Sun Nov 22, 2020 8:09 am
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 585
For my FPGA debugging, I debug 'HALT' as the second step after a valid character display.
The minor opcodes tend to shift around, with my trial and error designs. If I stlll have bugs
with programs, I check if I am downloading from the correct directory and that tends to fix it.
Good work, and blessings from the white mice.


Sun Nov 22, 2020 9:23 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
For my FPGA debugging, I debug 'HALT' as the second step after a valid character display.
That's a great instruction to have working. Good to have for simulators too. I just put the processor into an infinite branch loop to get things halted, although there is a stop instruction (almost the same as a halt) I should debug.

Latest Fixes:
<software> In CC64 switch statements were not compiling correctly due to changes to make switch 100% C compatible. The switch statement is now not 100% C compatible. Unlike C it does not allow arbitrary statements outside of cases. However, I have never used switch with arbitrary statements present. IMHO it is poor coding practice to do so. So, it is simply not supported in CC64. This means there will be the rare C program that requires a source update to run with CC64.

The assembler AS64 was not building the CSR instruction correctly leading to invalid instruction encodings. The assembler was also not fully encoding ADDUI,ORUI,ANDUI, and AUIIP instructions if bit #13 of the constant was odd. RTE was being assembled as a four-byte instruction when it is only three bytes. The assembler encoded the store of a return address register using an illegal instruction format. The opcode for the SEQ # instruction was incorrect (typo) leading to an illegal instruction error. The return offset of the RTL instruction was defaulted incorrectly causing a return past the correct return address.

The column that was not blanked properly by the text controller now is blanked properly. This was a simple adjust to a fudge register in the controller.

<hardware> Length decoding of the RTE instruction was missing resulting in a default of four bytes.

Latest Mod:
vectoring to the exception processing routine in the ifetch stage has been removed. The desire was to have low latency interrupts by vectoring immediately in the ifetch stage, however this led to a complicated design. It may now take 40 to 50 clock cycles (1us) to recognize an interrupt. All exception vectoring is now done in the writeback stage. The separate vector for non-maskable interrupts has been removed. Instead the exception handler will process the NMI cause code. Exception vectors are now spaced eight bytes apart rather than thirty-two. The reset address has been moved to $FF…FC0200 to allow more room for a vector table.


The text controller has been modified to display text in ZRGB6/7/7/7 colors, Previously, it had been displaying in ZRGB16/5/6/5 with unused bits in the memory cells.

_________________
Robert Finch http://www.finitron.ca


Tue Nov 24, 2020 4:16 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Fixes:
<software> Moves between special registers and general-purpose registers were not encoded with the correct instruction by the assembler. The code index register was not loaded properly then causing a jump to address zero. The compiler used the wrong size for the return block causing subroutine arguments to be shifted in position. This led to a display of blank characters.

<hardware> The RTL instruction was reading the immediate stack adjustment field from the wrong spot. This caused the stack adjustment to be off causing following stack operations to work incorrectly. Forgot to add the length of the instruction to the instruction address of the JAL instruction when determining the return address. This caused an infinite loop as the RTL returned to the JAL instead of the next instruction. In loop mode the instruction pointer (ip) continued to increment and be placed on the address bus while the pipeline was looping back. This eventually led to inaccessible memory being accessed causing a hang. The fix was to keep the ip constant while in loop mode.

Latest Mod:
Load and store multiple registers instructions were added as an option. LDM/STM can be used with integer, float or posit registers. The instructions are not quite as fast as using individual instructions to perform the load and store, but the instructions are a lot more code dense. They use a bit mask to specify the registers to load or store and insert individual load / store instructions into the pipe. If there is a zero bit in the mask, then a NOP instruction is inserted into the pipe. So, they are best used when there are a lot of registers to be saved and restored (context switch time).
Added a nightmare instruction: ‘ATNI’ add-to-next-instruction instruction. Which modifies the following instruction by adding to it. It could be used for instance with the BYTNDX instruction to search for different characters based on the contents of a register. Very similar to the ATNI instruction is the EXEC instruction which executes an instruction contained in a register.

Potential Future Mods:
Thinking about adding an ‘add and mask’ instruction. It would be used, for instance, to implement ring buffer indexes. This is like a bit-field increment operation. Maybe a bit-field add would be better.

_________________
Robert Finch http://www.finitron.ca


Wed Nov 25, 2020 3:38 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Fixes:
<software> The JAL instruction was not assembled correctly if a link register was not specified. It was using ra1 when it should have been using ra0.

Mods:
<hardware> The text controller was modified to support fonts up to 32x32. ToDo: is a programmble size cursor to go with the different font sizes. Currently the cursor is only 8x8. I am pondering whether to just have a ram containing the cursor image or to try and calculate a cursor image on the fly according to the font size. Most of the default cursors are pretty simple, vertical bar, horizontal bar, box, and asterisk. The bar and box cursors could be calculated.
<software> The glyph editor was updated to allow the use of fonts larger than 8x8. It can now handle fonts up to 32x32. It is a bit painstaking to use. There is no cut and paste, it is all mouse clicks to draw characters, but it works. A 12x18 font was created. The editor can input or output memory co-efficient files (.coe files). It can also output Verilog memory declaration constants.

Playground:
Playing with the text controller fonts today. Modified the text controller to accept bitmap fonts up to 32x32. Previously the font width was limited to no more than 8 pixels. I got started on modifying the controller as a result of changing things to prepare for audio output via HDMI. To use another core I found on the web, a standard TV format was required. The current format was 800x600 VGA, but apparently that is not a standard format. I tried changing the format around to 1280x720p and the monitor reported it as an unsupported format. Strangely the monitor accepts 800x600 which is not a standard format.

_________________
Robert Finch http://www.finitron.ca


Thu Nov 26, 2020 4:33 am
Profile WWW

Joined: Wed Nov 20, 2019 12:56 pm
Posts: 92
Nice work!

robfinch wrote:
Strangely the monitor accepts 800x600 which is not a standard format.


Remember there's a significant overlap between HDMI and DVI - because of that, a selection of common computer-oriented modes is generally supported, even if they're not mandatory "TV" standards. The EDID block specifies which ones.


Thu Nov 26, 2020 8:50 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The monitor in use is an LCD TV which is 1366x768 resolution IIRC. It has been fairly flexible in what has been fed to it, so I was kind of expecting that it would support 1280x720p. It could be that the frequency used to generate the 1280x720 was off by too much, an 80MHz clock was used rather than 85.86MHz. This leads to a 56Hz refresh instead of 60Hz. For 800x600 the clock of 40MHz was right on.

I was thinking of trying to make use of this core for its audio capability:
hdl-util/hdmi: Send video/audio over HDMI on an FPGA (github.com)
but it does too much. It generates the video timing, and the video timing is already generated in the Soc. I would like the core to be setup so all that has to be done is to feed it audio data. So, it would be a fair amount of work to modify. Anyway, I have been studying it. The Soc uses a modified rgb to dvi core from Digilent. The core was modified by adding an additional pipeline stage and using a table lookup rather than a sideways add to get the number of bits in the data. This gave a slightly higher fmax for the core.

_________________
Robert Finch http://www.finitron.ca


Thu Nov 26, 2020 1:30 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Since modifying the text controller it no longer displays correctly, the characters displayed are a jumbled mess. I can not perceive what the issue is. The horizontal shifting looks correct, but it looks like parts of different characters are appearing in the character cell. This should not be possible. Figured this one out. The glyph editor writes the bytes out in order from least significant to most significant. These were showing up as 64-bit values in the reverse order to what the controller needs. Issues were compounded because data for two different characters might show up in the same 64-bit word. After much head scratching, the simplest solution was to write only bytes and make the bitmap memory byte wide rather than 64-bit. The controller then very quickly reads as many bytes as it needs for the scanline. It can read the bitmap using multiple cycles because there are multiple clock cycles between each character fetch. However, if the font is three pixels wide or less it may not work. Seems to work great for the 7x10 font, now testing the 12x18 font.

_________________
Robert Finch http://www.finitron.ca


Fri Nov 27, 2020 4:00 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Screenshot of font in action 12x18 looks a little better.
Attachment:
File comment: Screenshot of startup showing 12x18 font
FPGAStartup2.jpg
FPGAStartup2.jpg [ 383.69 KiB | Viewed 779 times ]

_________________
Robert Finch http://www.finitron.ca


Fri Nov 27, 2020 5:18 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Latest Fixes:
<hardware> Supporting a greater font selection added two character times into the pipeline; delaying the character color by an additional two pipeline stages got missed resulting in the colors not being in sync with the characters displayed.
<software> The LEA instruction was being assembled as a multiply. The opcode value was wrong. A break statement was missing after the TST instruction processing in the assembler. This caused output of the next instruction which was an UNLINK. A CALL instruction was compiled for a call to leaf routine which should be called via a JAL instruction. The solution was to make everything consistently CALL and RET.

Latest Mods:
<software> The compiler was modified to parse binary numbers specified like 0b1110101, similar to specifying hexadecimal. It also now accepts and skips over the underscore in a number. With long strings of binary digits, it aids visualization of fields in the strings if they can be separated by underscores. The compiler was modified to always use RET instead of using RTL for leaf routines. The difference in the return convention was causing issues where function calls assumed non-leaf routines. So, now the compiler does not use JAL or RTL, it always uses CALL and RET.

_________________
Robert Finch http://www.finitron.ca


Sat Nov 28, 2020 5:30 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 121 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7, 8, 9  Next

Who is online

Users browsing this forum: DotBot and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software