View unanswered posts | View active topics It is currently Sun Aug 25, 2019 1:24 pm



Reply to topic  [ 483 posts ]  Go to page Previous  1 ... 24, 25, 26, 27, 28, 29, 30 ... 33  Next
 Thor Core / FT64 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
The poll for interrupt instruction should be a 16-bit compressed instruction format as it’s anticipated it would be used a lot in some codes. However, it is really just another form of software interrupt, this time the interrupt is conditional. It is desired therefore to reuse the BRK instruction logic by giving it a conditional execution option. However, BRK processing takes place in the ifetch stage before instructions are decompressed. A significant portion of the ifetch stage would have to be re-written to support a 16-bit compressed version. So a 32 bit instruction version is what’s supported. Specifying a cause code of 255 in the BRK indicates a PFI instruction.
As a trial I added this mux in the fetch stage which forces the PFI instruction to either the appropriate interrupt or a NOP.
Code:
      insn0 <= insn0a;   // Move instruction forward
      if (insn0a[15:0]==16'hFF00) begin   // BRK #255 (PFI)
         if (~|irq_i)   // If no interrupt
            insn0 <= {8'h00,`NOP_INSN};
         else
            insn0[20:0] <= {irq_i,1'b0,vec_i,2'b00,`BRK};
      end

In program code PFI may be coded as just PFI or a BRK #255.
Code:
BRK #255
RET

_________________
Robert Finch http://www.finitron.ca


Thu Mar 07, 2019 7:55 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
I’ve done something to the system that causes it to no longer get to the main menu. It does the 2 second delay loop then causes the text screen to disappear from sight and it’s hung. Recently I’ve been trying to get the system to run with variables located in the text controller’s memory. It’s as if it’s overwriting the controller regs instead of the display ram.
So, I’ve tried disabling all access to the text controller registers and doing so doesn’t seem to have any effect. Somehow the text screen is being disabled and it's looking like a hardware problem of some sort.

_________________
Robert Finch http://www.finitron.ca


Sat Mar 16, 2019 3:53 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
The following is executed during boot-up: it gets all the way into _DBGClearScreen because the screen is being cleared. Also LED status $26 is displayed on the LEDS. But it doesn’t reach LED status $27. It appears to be using the correct debug display attribute which is stored in the text controller memory. It appears the memsetW() function is failing to finish properly.
Code:
 start:
   ; This seems stupid but maybe necessary. Writes to r0 always cause it to
   ; be loaded with the value zero regardless of the value written. Readback
   ; should then always be a zero. The only case it might not be is at power
   ; on. At power on the reg should be zero, but let's not assume that and
   ; write a zero to it.
      and      r0,r0,#0      ; cannot use LDI which does an or operation
      ldi      $sp,#SCRATCHPAD+$B7F8   ; set stack pointer

      call   _Delay2s
;      call   _CopyPreinitData
      call   _SetTrapVector
ifdef SUPPORT_DCI
      call   _InitCompressedInsns
endif
+}
start4:
      ldi      r1,#$FFFF000F0000
      sw      r1,_DBGAttr
      call   _DBGClearScreen
start5:
      bra      start5


Code:
 void DBGClearScreen()
{
   int *p;
   int vc;

   __asm {
      ldi   r1,#$26
      sb   r1,LEDS
   }     
   p = DBGScreen;
   //vc = AsciiToScreen(' ') | DBGAttr;
   vc = ' ' | DBGAttr;
   memsetW(p, vc, DBGROWS*DBGCOLS); //2604);
   __asm {
      ldi   r1,#$27
      sb   r1,LEDS
   }     
}

_________________
Robert Finch http://www.finitron.ca


Sat Mar 16, 2019 4:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Readback of values across the video bridge and two input multiplexors appears not to work reliably. Right now, a bridge is a generic component. All three bridges in the system are instances of a common module. Using a generic module allows different peripheral cores to be moved around between bridges to optimize the system. The bridges and multiplexors with registered outputs are necessary to keep the system’s fmax high. It would be simpler to just use about a 20-to-one multiplexor on the cpu data bus, but that would entail a number of gate delays and also result in a lot of congested routing.
Attached is a block diagram of the v7 test system. Looks like I’m going to have to write some software for testing out the access to the text controller via the system bus network.
Text controller ram has a three-cycle read latency, plus an additional cycle for output multiplexing within the controller. That makes four cycles combined with two levels of registered multiplexors and the bridge component adding three more for a total of about seven cycles to read the controller’s memory.
The controller has about two cycles of latency for write operations, but the latency is hidden by the bridge component which acknowledges a write within a single cycle provided the bridge isn’t busy. Writes seem to be working.
The registered input multiplexors feature a bus hold so that the bus retains the previous value during dead cycles between peripheral address switches. This means that hold times for the data read-back should easily be met. An ack signal is sent back synchronous to valid data being placed on the bus. Since data is latched at the start of the next cycle there should be plenty of setup time.
Read transactions are synchronous bus transactions. The cpu core spits out a read request then sits and waits for an ack back. The text controller memory is in the un-cached address space.
Attachment:
File comment: FT64v7 Test system diagram
V7TestSystem.png
V7TestSystem.png [ 37.25 KiB | Viewed 3470 times ]

_________________
Robert Finch http://www.finitron.ca


Sun Mar 17, 2019 4:44 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
And today’s bug was the system bitmap controller (frame generator) was sending back an acknowledge for all writes in the system, instead of filtering to just it’s own ack. I’ve hit this kind of an error before. It has to do with using an ack generator component rather than discrete logic. When the component was substituted in place of discrete logic the write line input was not being qualified with a circuit select, resulting in unneeded acks back to the system.
The component is handy to use because it’s parameterized. One has only to specify proper parameters for the number of cycles to wait during a read or write operation before generating a WISBHBONE compatible ack. It saves having to re-write the same code in module after module.
Having fixed this bug and a couple of other minor things, presto, the system doesn’t even get to displaying the first $AA on the leds. Obviously I’ve upset something else somewhere.

_________________
Robert Finch http://www.finitron.ca


Thu Mar 21, 2019 8:47 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Found and fixed a write-enable line travelling through a bridge. The led $AA startup indicator works again. (One step forwards again).
For the text controller ram test routine, a routine was written to copy from the boot rom to the text controller’s memory. The copy was then readback and compared to the boot-rom contents. This did not work as expected; it blacked out the screen, then somehow shut it off entirely resulting in ‘no signal’ at the monitor.
Code:
_TestTCRam:
      ldi         $r6,#SCRATCHPAD
      ldi         $r1,#2048
      ldi         $r7,#0;ROMBASE
.0001:
      lw         $r2,[$r7]
      add         $r7,$r7,#8
      sw         $r2,[$r6]
      add         $r6,$r6,#8
      sub         $r1,$r1,#1
      bge         $r1,$r0,.0001

      ldi         $r6,#SCRATCHPAD
      ldi         $r7,#0;ROMBASE
      ldi         $r8,#TEXTSCR
      ldi         $r1,#2048
.0004:
      lw         $r2,[$r6]
      add         $r6,$r6,#8
      lw         $r3,[$r7]
      add         $r7,$r7,#8
      beq         $r2,$r3,.0002
      ldi         $r4,#$FFFFF8000020   ; Red background, white text
      bra         .0003
.0002:
      ldi         $r4,#$FFFF07C00020   ; Green background, white text
.0003:
      sw         $r4,[$r8]
      add         $r8,$r8,#8
      sub         $r1,$r1,#1
      bge         $r1,$r0,.0004
      ret
      

The next test copies from ram memory rather than the boot-rom. Boot-rom access is a bit tricky because it supports both pipelined access for cache fills and non-pipelined access.

_________________
Robert Finch http://www.finitron.ca


Fri Mar 22, 2019 5:37 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
The current issue with the screen blanking and hanging has me stumped for the moment so I started playing around with the text controller. Although the size of a character in the controller is set by a register setting, the controller really only supported hx8 characters because that’s all the char rom could handle. Well it’s been upgraded now to support hxv characters where h varies from 1 to 9 and v varies from 1 to 16. The maximum number of different characters that can be displayed depends on the v resolution. There’s only 4096 nine-bit bytes to contain the bitmaps. The current font is 8x8, but the plan is to use a 8x10 or possibly an 8x11 font. The font information is stored in a memory coefficients file (.coe) or a verilog file (.v) which can be generated by a font edit app.

_________________
Robert Finch http://www.finitron.ca


Sat Mar 23, 2019 3:11 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Fixed a couple of more minor issues in the system, and the text controller display no longer disappears. The software still isn’t producing proper output. Got rid of the vertical counter component and just used a bunch of inline counters in always blocks. Switching the display to 8x10 character format resulted in characters not being displayed properly. Something in the scanline calculation was amiss. Rather than debug I just re-wrote sections of the controller as I also wanted to add smooth scrolling capability. A multiplier in the address calculation was eliminated.

_________________
Robert Finch http://www.finitron.ca


Mon Mar 25, 2019 2:55 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Been tweaking the text mode controller. It can now handle the 8x10 character format and in theory any format from 1x1 to 9x16. For some reason TBD the overall size of the system increased about 3,000 LUTs. Some of the text controller registers did not support byte wide access, they now do. Using 256 programmable characters that are a group of two rows of four pixels in shape, the controller can support low-res graphics up to 384x150 (a 96x75 text mode).
The bitmap controller does not appear to be fully functional. It looks like it doesn’t have enough bandwidth for the display. This is perplexing as it’s only set to 400x300 resolution.
The cpu still does the mysterious display disappear and hang. I thought this was fixed but it's back.

_________________
Robert Finch http://www.finitron.ca


Tue Mar 26, 2019 5:14 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Redefined the windowTop, windowLeft co-ordinates of the text controller to be negative offsets rather than positive. This save two hardware negators while requiring the values loaded into the registers to be modified slightly.
I believe I’ve traced the reason the entire screen was blanked to faulty logic that sets the screen column number. The iblank signal was active all the time because of a bad column number.
The border area color selection has been reduced to a single color from two alternating colors. Two colors allowed a fancy checkerboard pattern for the border. The border is now a single solid color.
I’m wondering where to place a scan-line or raster line interrupt component. Should it be placed with the a) bitmap controller, b) text controller c) sprite controller, or d) as its own component? The raster-line interrupt generates an interrupt signal when the raster-line reaches a particular value. I’ve used this in the past on the C64 to increase the apparent number of sprites, and to create a split graphics / text screen.
As I’ve got the raster interrupt setup at the moment, it works similar to the raster interrupt on the C64 and other machines. There is a single compare register which triggers an interrupt when the raster line number reaches that value. To disable the raster interrupt the compare value is set to a value that can’t be reached ($FFF).

_________________
Robert Finch http://www.finitron.ca


Wed Mar 27, 2019 3:59 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1226
I've been thinking about raster line interrupts. My model is the Amiga's Copper, although I don't have actual hands-on experience of using it. I'd be inclined to separate out
- framebuffer / timing control
- blitter
- copper
- sprite engine
and of course if you have character/tiled graphics modes, that's another thing again. Most things are optional - even the framebuffer!


Wed Mar 27, 2019 8:47 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Using an Amiga style copper with wait-for-scan-position instruction rather than a raster interrupt compare register is a great idea. Can you give us more details?

I’ve studied the Amiga some and even come up with similar circuits running in an FPGA (copper, blitter, etc). However, I’m convinced that the copper and blitter don’t have a place in a modern system, a GPU makes a lot more sense. So, I’ve shelved work on a copper / blitter in favor of a GPU. The issue is that memory is so slow that a GPU can do all the operations of the copper or blitter in the same amount of time. There are also the transistor budgets to support GPUs now which maybe weren’t available back when. However, in a retro style system the copper and blitter certainly have a place. I dual ported the input to the bus bridges in the system so that a second processor (copper/GPU) could have access to the I/O register set.
There are some features of the copper that I like, it’s simplicity (just four instructions). It has a small footprint. I’ve no doubt it was a powerful and cost-effective solution for it’s time. For somewhat more cost in resources a general purpose cpu that is customized to include some copper capabilities seems to make more sense to me. Additions to the GPU include a wait-for-frame-position instruction and fixed-point arithmetic operations.

FT64 system on chip is broken down into separate components almost as you’ve outlined, these can be seen on the SoC diagram a few post prior.
- optional background image controller
- frame buffer / timing control ( bitmap controller )
- text mode controller
- sprite engine
- audio controller

Also important is a multi-port memory controller (mpmc). The mpmc has a number of ports to main memory customized for the individual components in the system. Rather than using a clock phase arrangement like the Amiga and other retro systems, DMA ports with read caches are used. DMA controllers are distributed in their corresponding components rather than using a centralized DMA controller because the DMA requirements are different based on the component.

For instance, for the sprite controller (port #5) there is a read cache for each individual sprite, so the sprite number is passed from the sprite controller to the memory controller, becoming effectively part of the address. The sprite read cache caches 256 bits using a single double read operation to the ddr ram. This lowers the number of read ops to the ddr ram. Each sprite has 32 four color pixels (64-bits) in a scanline. Since the read operation to the ddr ram reads 256 bits, a total of four sprite scanlines are cached. The sprite controller requests only 64-bits of data at one time.

For the frame buffer (port #0) 512 bits at a time are read and cached by the mpmc to lower the number of read ops. The frame buffer requests only 64-bits at one time, so eight requests are cached by the mpmc. The number of pixel read by the frame buffer depends on the color resolution. The frame buffer itself contains a line fifo to allow data streaming through the DMA channel.

The text controller does not use main memory, instead it uses dedicated dual port shared ram to get better ram performance. Some of the text controller memory Is being used to store the system variables and stack.

_________________
Robert Finch http://www.finitron.ca


Thu Mar 28, 2019 2:49 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Added a 128-bit memory bus option in addition to the 64-bit memory bus for the bitmap controller. This allows more pixels to be read from the cache in a single read operation. It should help meet bandwidth requirements for the display.

_________________
Robert Finch http://www.finitron.ca


Fri Mar 29, 2019 5:27 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1226
robfinch wrote:
Using an Amiga style copper with wait-for-scan-position instruction rather than a raster interrupt compare register is a great idea. Can you give us more details?


There's a thread here on the copper. It's very simple, and deterministic, and what it does is write values to memory when the beam reaches chosen coordinates.
Quote:
3 commands Move, Wait and Skip

This is quite attractive to me: interrupts can be slow, costly, and variable in latency. As the copper can store to any register, it can update a frame buffer, update a control register related to video, or a sprite control word, or trigger an interrupt.

Quote:
I’m convinced that the copper and blitter don’t have a place in a modern system, a GPU makes a lot more sense.

Well, if you only had one thing, maybe a GPU would be the thing. Of course, it's large and expensive, so you need to be able to afford it too. I know next to nothing about it, but the Mode 7 texture scaling engine of the NES might be the minimal viable offering.
Quote:
Mode 7 is a graphics mode on the Super NES video game console that allows a background layer to be rotated and scaled on a scanline-by-scanline basis to create many different effects


Of course, people differ in their preferences: I'm seeking simplicity rather than functionality.


Fri Mar 29, 2019 1:30 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 918
Location: Canada
Quote:
This is quite attractive to me: interrupts can be slow, costly, and variable in latency. As the copper can store to any register, it can update a frame buffer, update a control register related to video, or a sprite control word, or trigger an interrupt.
I found the copper attractive too. It can only write values though. I got to wishing that it could also load from an I/O register, so that it could load a number from a PRNG to be stored in another I/O register. To allow sprites to randomly change direction for instance. So I got to thinking it would be cool if it had a set of registers, then realized that would be making it into a more general purpose cpu. It dawned on me that that's part of the reasoning behind a GPU.

Copper can be viewed as a stripped down cpu. The MOVE command is just a store constant to I/O register. This is typically done with two instructions in an ordinary cpu. Load immediate to register, then store register to I/O register. The WAIT command could easily be implemented with a regular cpu with a handful of instructions. An I/O register that contains the scan position could be loaded with a regular load operation, then the value masked with an and operation, next a compare to a constant scan position, and finally a branch based on the compare. It is possible to use a more general purpose cpu like OPC8 for instance, in place of Copper. Being able to define macros for MOVE and WAIT in assembler would be helpful.

Added a 32-bit bus master interface option to the frame buffer (bitmap controller) as it had a choice of 128 or 64 bits. The 32-bit interface may limit display resolutions to lower values as not as many pixels are being fetched in a single cycle.

Quote:
the Mode 7 texture scaling engine of the NES might be the minimal viable offering.

This is sort of where I was going with the GPU based around FT64. It's matrix transformations on points. There is a transform instruction in the GPU's ISA. It does a lot of MAC operations at once. Combine that with multiple GPU's operating at the same time and a lot of work can be done fast.

_________________
Robert Finch http://www.finitron.ca


Sat Mar 30, 2019 5:18 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 483 posts ]  Go to page Previous  1 ... 24, 25, 26, 27, 28, 29, 30 ... 33  Next

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software