Last visit was: Fri Nov 21, 2025 4:05 am
|
It is currently Fri Nov 21, 2025 4:05 am
|
| Author |
Message |
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2423 Location: Canada
|
Worked some on the reservation stations and bypassing networks. The bypassing networks are not shown on the pipeline diagrams as that would clutter up the diagram.
Reservation stations queue up to four arguments for up to three instructions. The argument values are set from the register file or from the bypassing networks. There are at least four bypassing inputs (parameterized). The current design has eight inputs.
Four of the bypassing inputs come from the input to the register file. This trims a clock cycle off of register access time. The other four inputs come from the outputs of frequently used functional units. For instance, the output of the first simple arithmetic unit (SAU) is bypassed back to its input so that back-to-back instructions can be made single cycle. It also feeds the input to other functional units.
The reservation stations are set up to be generic in nature. The same component is used to support different functional units. While the stations support up to four instruction arguments, all types of instructions (functional units) do not need that many arguments. The hardware for unneeded arguments will get trimmed by the synthesizer.
*****
Used up eight opcodes for SIMD support. Also, it was decided to move the precision field out of the branch format and into the opcode. This caused eight more opcodes to be used. But gives two more bits for the branch displacement.
To support lower precision non-SIMD operations the upper bits of the destination register are set to zero.
There are about 24 opcodes left open.
_________________Robert Finch http://www.finitron.ca
|
| Fri Nov 14, 2025 3:10 am |
|
 |
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2423 Location: Canada
|
Some work on extended precision arithmetic. Added an ADC instruction that adds three source operands and produces low order and high order (carry bit) in two destination registers. A 256-bit add can then be done with just four instructions. Format: adc Rd1, Rs1, Rs2, Rs3, Rd2 Code: adc a3, a1, a2, 0, cy0 adc b3, b1, b2, cy0, cy1 adc c3, c1, c2, cy1, cy2 adc d3, d1, d2, cy2, cy3
Shift instructions where also added that save the upper or lower bits of the shift result in a second destination register. Added some more conditional move instructions. Conditional move if even (CMOVEVN), move if less than zero and move if less than or equal to zero. Decided to get rid of the ADDnUI instructions. I cannot see them being used that often and the same functionality is available using a regular ADD_ASL instruction by substituting an immediate for Rs2. It is a little bit less code dense. It is probably worth it to simplify the instruction set. Here is a table of the root opcodes: Attachment: Qupls2026_opcodes.jpg
You do not have the required permissions to view the files attached to this post.
_________________Robert Finch http://www.finitron.ca
|
| Sat Nov 15, 2025 4:14 am |
|
 |
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2423 Location: Canada
|
Changed the PUSH and POP instructions from being implemented with micro-code to being implemented using the micro-op translator. PUSH and POP are now translated into one to five micro-ops depending on how many registers are used. There is less overhead and better performance of the operations when translated to micro-ops.
Changed the base data-path width to 128-bits which I am going to try and see if it will fit.
There are now 128 logical registers available in Qupls. It turns out that the BRAM setup is 512 registers deep no matter whether there are 32,64 or 128 registers. So, may as well make them available.
_________________Robert Finch http://www.finitron.ca
|
| Tue Nov 18, 2025 3:33 am |
|
 |
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2423 Location: Canada
|
Changed the base data-path width back to 64-bits. There is no longer any micro-code or HW state machine. Some of the operations done with micro-code could be done using the micro-op translator. Broke the ENTER and EXIT instructions into two separate instructions each so that they would fit into the micro-op translator. ENTER and EXIT no longer push and pop registers from the stack. That is done by a second (or third) instruction now. Code: ENTER 64 ; allocate 64 bytes for non-safe stack usage PUSHSS s0, s1, s2, s3 ; push regs onto safe stack … POPSS s3, s2, s1, s0 ; pop regs from safe stack EXIT 64 ; deallocate and return
Decided to drop the sign control bits from the instruction set. In many cases having sign control bits did not make sense. For instance, when using a base register during an address calculation it probably would never negate the base register value, so a sign control bit is wasted for that case. Another case is branch instructions. Because there is branching on relative conditions, if a change in the sign of an operand is needed it can often be done by swapping operands. Sign control is now sometimes controlled by the opcode as is typical in many machines. Rather than having an ADD with sign control there is now both an ADD and SUB. It was desired to support 128 registers and removing the sign control bits makes this possible. The ISA uses the 128 registers as SIMD registers by grouping registers into groups of four. That makes 32 x 256-bit SIMD registers available.
_________________Robert Finch http://www.finitron.ca
|
| Wed Nov 19, 2025 12:57 am |
|
 |
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2423 Location: Canada
|
Worked on supporting vectors (SIMD) with Qupls. Like many other designs, Qupls uses a scalar register to contain a mask for vector operations. Many instructions directly support masked operations. To mask specific elements of the vector the appropriate bit mask must be generated. This can be done using one of the SET instructions. The SET instruction will set or clear bits required to reference particular elements of the vector. Rather than a vector length register, Qupls uses a global mask (vgm) register. This register needs to be set up to contain a bit-mask corresponding to the elements that should be active. To set this register in a manner analogous to setting a vector length register, a special 256-bit constant can be loaded into a vector register, then a SET instruction used. Like the following: Code: OR r8,r0,$0x0706050403020100,0 OR r9,r0,$0x0F0E0D0C0B0A0908,0 OR r10,r0,$0x1716151413121110,0 OR r9,r0,$0x1F1E1D1C1B1A1918,0 SLT.BP vgm,v2,#12,$-1
Which sets up the mask register for 12 elements that are a byte wide. The mask can also be set much easier with an immediate constant: OR vgm, $0x3FF, $0, $0 ; mask for 12 elements, any element size The vector element size (VELSZ) register contains a code indicating the size of a vector element. Elements may be 8, 16, 32, or 64-bits wide for integers or 16,32,64, or 128-bits for floats (128-bit floats not being supported currently). The VELSZ allows size agnostic vector instructions to be used. VADD v1, v2, v3 will add two vectors according to the vector element size. This makes it possible to write a vector routine without a specific element size specified. Worked on updating the SAU (simple arithmetic unit) to support vector operations.
_________________Robert Finch http://www.finitron.ca
|
| Thu Nov 20, 2025 1:28 am |
|
 |
|
gfoot
Joined: Sat Oct 04, 2025 10:54 am Posts: 25
|
robfinch wrote: Like the following: Code: OR r8,r0,$0x0706050403020100,0 OR r9,r0,$0x0F0E0D0C0B0A0908,0 OR r10,r0,$0x1716151413121110,0 OR r9,r0,$0x1F1E1D1C1B1A1918,0 SLT.BP vgm,v2,#12,$-1
Which sets up the mask register for 12 elements that are a byte wide. I was trying to understand this - is the fourth "OR" call meant to be loading r11 though, rather than r9?
|
| Thu Nov 20, 2025 10:44 am |
|
 |
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2423 Location: Canada
|
Quote: was trying to understand this - is the fourth "OR" call meant to be loading r11 though, rather than r9? Sharp eyes. Definitely. Should be: Code: OR r8,r0,$0x0706050403020100,0 OR r9,r0,$0x0F0E0D0C0B0A0908,0 OR r10,r0,$0x1716151413121110,0 OR r11,r0,$0x1F1E1D1C1B1A1918,0 SLT.BP vgm,v2,#12,$-1 v2 and r8 to r11 are aliased.
_________________Robert Finch http://www.finitron.ca
|
| Thu Nov 20, 2025 4:43 pm |
|
Who is online |
Users browsing this forum: chrome-8x-bots and 22 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|