View unanswered posts | View active topics It is currently Thu Mar 28, 2024 3:33 pm



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 25, 26, 27, 28, 29, 30, 31 ... 52  Next
 Thor Core / FT64 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I’ve been reviewing the powerful AVIC core. Found a bug by inspection in the flood fill operation. The AVIC core has a frame buffer, blitter, text blitter, copper, sprite graphics and sync generator in addition to drawing and filling lines, horizontal lines, triangles, rectangles and curves. Also supported is flood-filling. It can also transform points by translating, rotating, or scaling them. The AVIC core also handles audio channels. The AVIC core was intended to be an all-in-one core for video / audio so a single component could be plugged into the system to provide A/V services.

Tonight, I was looking at using pieces of the AVIC core relevant to graphics acceleration. So, I need a name for the latest graphics accelerator core. There is already a frame buffer, sync generator and sprite graphics in the system, so those features from the AVIC can be omitted. A challenge is to see how fast the draw routines could be implemented in a processor style solution rather than custom hardware.

One issue with the system is that the graphics operations are sharing main memory. Care must be taken so that the graphics accelerator doesn’t hog the memory too much or the cpu won’t be able to do anything.

_________________
Robert Finch http://www.finitron.ca


Sun Mar 31, 2019 2:58 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Is this AVIC core something of your own? Or open source? I can't find anything about it.


Sun Mar 31, 2019 9:32 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The AVIC core is an open source core I developed myself after studying Amiga docs and the minimig core. And it isn't Amiga compatible.
It's under the video cores section in GitHub.
http://github.com/robfinch/Cores/blob/master/video/trunk/docs/AVIC.docx
http://github.com/robfinch/Cores/blob/master/video/trunk/rtl/AVIC128.v
I couldn't find the most recent docs, but there are docs for another version of core that's mostly the same under the doc directory. I will try to get the docs updated.
AVIC.docx. The main difference between the AVIC and AVIC128 core is the use of a 128-bit main memory bus by AVIC128 and block ram by the AVIC core.
AVIC128 was designed to be used as a 64 bit peripheral core, AVIC was designed as a 16 bit peripheral core. Some of the registers may be different. It's safest to look at the Verilog code for register usage.

_________________
Robert Finch http://www.finitron.ca


Sun Mar 31, 2019 3:43 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Thanks!


Sun Mar 31, 2019 4:36 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Worked on the graphics accelerator. Modified it to support 32bpp in addition to 16bpp color depth. Been re-thinking the idea of using an accelerator rather than a GPU. It’s simpler in some ways.

Also wrote a planer frame buffer controller today primarily just for the challenge.

When rendering the display, a planer controller doesn’t require any more memory bandwidth than a non-planar controller. There’s a set number of bits that have to be accessed for a scan-line and whether they’re accessed in a planer manner or not makes no difference.

A planer controller might have slightly better failure. If there isn’t enough bus bandwidth then the least significant bits of the color get screwed up. (Assuming least significant planes are fetched last). This may cause an image to appear at reduced color depth.

However, what happens if you want to draw a line in the color purple? With planer graphics all the planes must be updated with separate write operations. At 8bpp that’s 8 writes to update the pixel color for each pixel. That would make updates eight times slower than they are with non-planer graphics. In greater color depth modes, the situation is even worse.

A planer controller could work much faster if there were separate memories for each plane. All memories could be accessed in parallel then. This could be done with special dual-ported memories, possible in an FPGA. Dual-port ram can be made to look like single bit wide planer memory while at the same time supporting non-planer access on the other ram port. The two ports could be given two distinct address ranges so that access as planer or non-planer memory is possible.

_________________
Robert Finch http://www.finitron.ca


Wed Apr 03, 2019 2:50 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Getting back to the bug that’s hanging the system. For some reason the system is in a reset loop. It does some screen updates then resets, beginning the process over again. One hypothesis is that there is an unknown exception occurring. Before exceptions are setup, the exception vector is defaulted to the reset address. If an exception of some kind was flagged it would then cause the processor to reset. There shouldn’t be any exceptional conditions happening in the start-up code at this point, however. I reviewed all occurrences of the reset vector and vector table address register and didn’t find anything unusual. The system was working at one point.

To prevent the processor from hanging indefinitely there is a check at the commit stage of the processor. If the re-order buffer’s head of queue hasn’t advanced within 1000 clock cycles, then it’s flagged as an exception. If an exception were occurring, this is one likely place.

The integrated logic analyzer (ILA) shows a reset operation occurring at instruction $F…FC0502. This is a simple add instruction.

_________________
Robert Finch http://www.finitron.ca


Thu Apr 04, 2019 3:15 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
How about arranging that an unhandled exception causes a halt? Maybe you could have a halt vector, and have that as the initial target.


Thu Apr 04, 2019 8:12 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
How about arranging that an unhandled exception causes a halt? Maybe you could have a halt vector, and have that as the initial target.

I suppose I could add halt capability to the cpu. There is just one vector though to handle all exceptions. Double bus errors would be one place the cpu should halt. FT64 follows the RiSCV paradigm for handling exceptions. There is just a single vector for processing all exceptions and the cause of the exception is stored in a cause register. The ISR looks at the cause register to determine what to do. The ISR could use a jump table and jump based on the cause as an index. A halt could be a cause code.
The current break handler just uses a compare tree consisting of a bunch of xor’s and beq’s to choose what to process. One might think this would be slow, except that in order to use a vector table instead, part of the table containing the vector would have to be loaded into the cache and a memory load operation would take place. It could take 20 cycles just to load the vector. In the amount of time the vector load takes, what to branch to could be resolved by a tree based approach.
Code:
__BrkHandlerOL01:
__BrkHandlerOL02:
__BrkHandlerOL03:
__BrkHandler:
      sync
      and      r0,r0,#0            ; load r0 with 0
      csrrd   r22,#6,r0            ; Get cause code
      and      r22,r22,#$FF
      xor      r1,r22,#TS_IRQ
      beq      r1,r0,.ts
      xor      r1,r22,#GC_IRQ
      beq      r1,r0,.lvl6
      xor      r1,r22,#KBD_IRQ
      beq      r1,r0,.kbd
      xor      r1,r22,#FLT_CS
      beq      r1,r0,.ldcsFlt
      xor      r1,r22,#FLT_RET
      beq      r1,r0,.retFlt
      xor      r1,r22,#240            ; OS system call
      beq      r1,r0,.callOS
      beq      r22,#FLT_CMT,.cmt
      beq      r22,#FMTK_SYSCALL,.lvl6
      beq      r22,#FMTK_SCHEDULE,.ts2
      beq      r22,#FLT_SSM,ssm_irq
      rti         

Removed some of the stomp logic associated with aborting a memory cycle early. The stomp logic shouldn’t be required for stores since stores only issue if there’s nothing that could cause a stomp (a change in program flow). For loads, they are now allowed to complete, the result won’t be used anyway.

_________________
Robert Finch http://www.finitron.ca


Fri Apr 05, 2019 5:10 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Dumped the exception cause code in the ILA, to determine what is happening. It turns out to be an address alignment fault. Since there’s not yet a handler for an alignment fault the service routine returns to the offending instruction, then gets another fault. It is effectively caught up in an alignment fault infinite loop. I recently added alignment fault checking to the core.
The issue now is why it gets an alignment fault in the first place. AFAICT it shouldn’t be. The instruction it faults on is a store word register indirect with no displacement. The register value is only ever increment by eight so it should be remaining word aligned. (It starts at a word aligned address).
I’ve been staring at the fault code and can’t see anything obviously wrong, so I’ve disabled it for now.

I changed how exceptions are passed around in memory channels. An exception is now always recorded directly in the re-order buffer entry for the instruction. Previously the exception was sometimes being propagated through the memory system before updating the re-order buffer entry. The changes eliminated several intermediate signals and hopefully simplify things.

Traced things now until the processor’s virtual address goes nuts during an i-cache load. Out of the blue the virtual address shows up as $F…F4000 when it should be $F…F04E0. I just went to get a screen-shot of the ILA output and the workstation hung had to reboot.

_________________
Robert Finch http://www.finitron.ca


Sat Apr 06, 2019 2:57 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Fixed an issue which may have caused memory issue flags to toggle. This may have caused load / store instructions to be skipped over.
The instructions fetched aren’t corresponding to the program counter address. If I didn’t know better, I’d say there is something amiss in the cache control. Anyway, the system is stuck in an infinite break loop. The brk instruction is fetched at the address of the break routine. Which causes it to call the break routine again. As assembled, there isn’t really a brk instruction at that address, but instead that’s what being read by faulty cache reads. I reviewed the cache code and couldn’t see anything obvious.
I tried to find out where the sequence of bytes that are being loaded is from in the rom. Didn’t have much luck with that. I was thinking that since the cache is four-way set associative it might be reading from the wrong set.

_________________
Robert Finch http://www.finitron.ca


Mon Apr 08, 2019 3:54 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
False matches were occurring in the cache after a reset. Whatever was in the cache was being treated as valid even after a reset. A tag valid bit was added that is cleared on reset. Since the cache tag memories default to zero on power-on this was not a problem unless the core was reset after being on.

Circuit selects (cs* signals) were being held active as a result of decoding I/O bridge outputs. This should not have mattered because the circuit selects are further qualified by cyc and stb signals which were correct. And indeed, the system seems to work that way. However, it was revealed that there were an unnecessary number of address bits being decoded to detect cs* signals. The I/O bridge filters out the top 12 address bits for ($FFD) so there should be no need to include those in the device select decoding. About 300 comparator bits were saved by excluding the top twelve address bits from the decode.

_________________
Robert Finch http://www.finitron.ca


Tue Apr 09, 2019 4:32 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
That's an interesting kind of suboptimality, decoding more address bits than needed. Not a bug, and not detectable in behaviour, but spending resources. Just possibly costing some speed: but most things are not on the critical path, so you wouldn't know. I wonder how much wastage there is in an average SoC, where large teams of people are working to a deadline.


Tue Apr 09, 2019 9:20 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The data cache load state transitions back to itself for burst data loads where data is loaded each clock cycle. For non-burst data cache loads a second acknowledge state is transitioned to wait for data. The state was always transitioning back to the load state even for non-burst loads. This caused bad data to be loaded for non-burst loads.

Even though the core is being built without sophisticated branch prediction, it still uses the sign bit of the displacement to predict the branch path.
The branchmiss pc was sometimes being set incorrectly. It was taking the branch outcome into consideration when it shouldn’t have been. Which pc is the missed pc depends only on which path the branch was predicted to take. The branch outcome only needs to be taken into consideration to determine if there is a miss. I must have tweaked the logic a couple of weeks ago, because this was working at one point.
Bad branch miss code:
Code:
 fcu_misspc = !(fcu_takb ^ fcu_pt) ? fcu_nextpc : {fcu_pc[AMSB:32],fcu_pc[31:0] + fcu_brdisp[31:0]};

Fixed branch miss code:
Code:
 
fcu_misspc = fcu_pt ? fcu_nextpc : {fcu_pc[AMSB:32],fcu_pc[31:0] + fcu_brdisp[31:0]};

The routine to test the text display controller ram is now working. The screen turns green indicating that rom was successfully copied to ram.

The RET instruction was flagged as being a memory instruction. As a memory instruction there is no requirement to wait for the second register of the instruction to be valid before proceeding. However, the RET instruction needed this valid as that register contains the return address. As a result, RET instructions would issue too soon and try and return to an invalid address. This only mattered when the link register value wasn’t ready yet, so sometimes the RET would work.
I had flagged the RET instruction as being a memory operate back when developing a segmented version of the core.

_________________
Robert Finch http://www.finitron.ca


Wed Apr 10, 2019 3:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got back to the monitor running! It’s not perfect but at least text is able to print to the screen now.

Control codes at the end of a line of assembler were causing the assembler to sometimes abort processing. This resulted in routines missing from the final assembled file.

Modified the ORSoC graphics driver to work with the FT64v7SoC. The first test is to set the bitmap screen to green by drawing a rectangle the size of the screen. The version of ORSoC graphics in use is modified from the original to support 64-bit bus accesses. A dedicated bitmap font text blitter has also been added.

And...

The first test looked like it hung, but it came back after busy waiting at the graphics controller for about five to ten minutes. It looks like 100% of the memory bandwidth is being used up and so the graphics controller doesn't get any time.

_________________
Robert Finch http://www.finitron.ca


Thu Apr 11, 2019 3:39 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Been trying to get the bitmap graphics going. The bitmap graphics are now rock solid at 400x300x16bpp resolution. I had to make a number of changes to get it to work. I altered the bitmap controller so that it ping-pongs between two scan line fifos. While it’s collecting data in the one fifo the other fifo is being used to display data. Using two fifos allows data fetching to take place across multiple scan lines for a given displayed line. Without two fifos fetching and displaying data is limited to a single scan line. All the fetches for the display must occur within one scan line time. This means a much higher peak memory bandwidth is needed to display lower-resolution graphics with only one fifo. With the “budget” version of the controller I thought I could get away with a single fifo “racing” the beam.
At 400x300 resolution each displayed “line” takes two display scan lines and two display clock cycles per pixel. It’s really an 800x600 mode with the resolution divided down.

I also changed the multi-port memory controller to wait for a NACK state before transitioning to an IDLE state and to detect a falling edge on the ack signal as a nack. The issue here had to do with crossing clock domains. I’m guessing that sometimes the controller would switch back to the IDLE state too soon.
The multi-port memory controller was waiting until the end-of-burst signal was active before latching data and incrementing the address. This was due to a mis-understanding of the purpose of the app_rd_mem_end signal.

The ORSoC graphics accelerator available at opencores.org, has been put to use for this system. A demo routine is being written to show it working. It’s a pretty simple demo, it draws 10,000 points, lines, triangles, then rectangles to the screen. The accelerator isn’t being used “out of the box”, instead a modified version is in use. Modifications included merging all the bus master ports together into a single read/write port.

_________________
Robert Finch http://www.finitron.ca


Fri Apr 12, 2019 3:33 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 25, 26, 27, 28, 29, 30, 31 ... 52  Next

Who is online

Users browsing this forum: AhrefsBot and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software