Last visit was: Sat May 30, 2026 10:23 am
It is currently Sat May 30, 2026 10:23 am



 [ 304 posts ]  Go to page Previous  1 ... 17, 18, 19, 20, 21
 Qupls (Q+) 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2505
Location: Canada
Managed to get the core size small enough to allow a DRAM controller to fit as well as the rest of the system. A video frame buffer is small enough that it may fit as well. So, I am just trying that out ATM.

The multi-port memory controller is too large to fit though, so some simpler control logic was added to mux between the CPU and a video frame buffer.

100 MHz timing for the system was missed by 200ps. It may or may not work depending on the chip. But I did not feel in a gambling mood, so the clock was reduced to 89.29 MHz. In theory it should work at 96 MHz.

_________________
Robert Finch http://www.finitron.ca


Thu Apr 09, 2026 3:30 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2505
Location: Canada
The design occupies over 90% of the FPGA. Timing was not met by a few hundred pico-seconds in the scratchpad RAM. The miss is entirely due to routing delay. The routing delay is over 98% of the time required. Only 200 ps are needed by the logic. I chalk this up to the fullness of the FPGA.

Shelving the multi-port memory controller. It is not as great a component as I thought, having thought about it some more. The issue is that it creates a lot of duplication of logic. The controller contains FIFOs for high-speed buffering of the data. The issue is that there are also FIFOs for data buffering in the devices connected to the memory controller. I wonder if double-FIFOing is really necessary. Started working on a newer better version.

For the Qupls SoC I created a simple DRAM bridge instead, which is used with a MUX at the inputs. It is much smaller than the memory controller and probably about 90% as effective.

I tried building the system out to a bitstream that can be loaded into the FPGA. Nothing worked of course. Just a blank screen with a few scrambled characters. There are about 15k LUTs left to use, which are being reserved to fix future mistakes.

_________________
Robert Finch http://www.finitron.ca


Sat Apr 11, 2026 2:49 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2505
Location: Canada
Added a POLY instruction to the ISA which evaluates polynomials using Horner’s method. This is similar to the POLY instruction on the historic VAX mainframe computer. My version will only evaluate polynomials of degree seven or less, but it can be called multiple times to evaluate polynomials of higher degree.

I decided to limit the polynomial evaluated to a degree seven or less, but provided the option to continue evaluating a polynomial using an additional instruction. A seventh-degree polynomial is the largest one evaluated in the library.
Code:
FPOLY r10, r20, r30, #7      ; evaluate terms 0 to 7
FPOLY r10, r20, r38, #7|64   ; evaluate terms 8 to 14
FPOLY r10, r20, r45, #7|64   ; evaluate terms 15 to 21

The issue is the use of the micro-op translator to supply micro-ops for evaluating the polynomial. The number of micro-ops is kept to a minimum. There is a max of 16 micro-ops per ISA instruction.

_________________
Robert Finch http://www.finitron.ca


Wed Apr 15, 2026 5:53 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2505
Location: Canada
Got the system to build out to a bitstream again that could be loaded into the FPGA. It missed timing by 400 ps and took about 6 hours to build. Still just a display of random characters, with no hi-res graphics. I was expecting random bits of data to appear for the bitmapped graphics. Lack of graphics data means the interface to DRAM likely is not working.

Started work on version 5 in the meantime, a major update to Q+. Shrunk the instruction down to 40-bits from 48 by removing the Rs3 and Op3 fields of the instruction. FMA and other instructions requiring a third register will now need to use a prefix supplying the third register.

There are still 128 registers, but float instructions now have access to only the last 64 registers. Integer instructions can be used to operate on float values as they have access to all registers.

Branches are handled completely differently, using condition code registers ala PowerPC.

Most recent additions are two floating-point exception status registers. For floating-point operations r1 is defined to hold the value +1.0. r0 is defined to hold the value +0.0.

FADDI and FMULI (float add immediate and float multiply immediate) were added to improve code density for those operations.

There are two floating-point exception status registers primarily so that a status register update can be ignored by specifying the second register.

Still undecided on how to manage the status and condition registers. Whether to rename them like the GPRs, or use the re-order buffer for storage and renaming. There are only a handful of registers so it is tempting to allocate storage in the re-order buffer.

_________________
Robert Finch http://www.finitron.ca


Sat Apr 18, 2026 3:50 am WWW
 [ 304 posts ]  Go to page Previous  1 ... 17, 18, 19, 20, 21

Who is online

Users browsing this forum: Alibaba-cloud-2, Chrome-12x-bots, claudebot, facebook crawler and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software