View unanswered posts | View active topics It is currently Thu Nov 14, 2019 4:48 pm



Reply to topic  [ 90 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6
 nvio 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Made the condition registers eight bits wide to accommodate an exception flag, and potentially a couple of other flags such as odd and parity. Most integer memory load and ALU instructions set condition register results. (Odd, negative and zero can be set). If an exception occurs during the execution of an instruction, then the exception status flag will be set. This can then be tested by a branch instruction.

Note that vector instructions don’t set a condition result register. It’s difficult to see what the meaning would be behind setting a condition register (conditions for which vector element?) Instead the field is used to specify the vector mask register to use. Branch unit instructions also don’t set a condition result.

Made the return instruction return to one of two different link addresses based on the exception flag in a condition register. If the exception flag is set, the return is to Lk2 else the return is to Lk1. So, there are really two potential return addresses for a call instruction. One return address is the instruction after the call, the other return address is the exception handler for the code block containing the call. The call instruction implicitly sets Lk1, Lk2 must be set manually at the start of the code block (try block).

_________________
Robert Finch http://www.finitron.ca


Tue Nov 05, 2019 3:12 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Added more link (code address) registers to the design for a total of eight. The idea is that if the call depth is known, a separate link register can be used for each call depth without requiring the link register to be saved and restored. For instance, a hex number print routine calls a word print, byte print, then nybble print. Each routine is at a different depth. By using different link register for each routine they don’t need to be saved. A code address register is also used to hold the target address for computed goto’s. The current catch handler address is also stored in a code address register.

Also modified the RET instruction to determine the link register to use indirectly from a condition code register (Typically Cr0). This lets software select an alternate return address in cases of exception handlers.

A pair of compare instructions (immediate and register) were removed as being redundant with a subtract instruction.

Following are instruction formats and root opcode map. This is a snapshot of work in progress.
Attachment:
File comment: NVIO IFormats page 1
IFormats1.png
IFormats1.png [ 96.73 KiB | Viewed 214 times ]

Attachment:
File comment: NVIO IFormats page 2
IFormats2.png
IFormats2.png [ 37.1 KiB | Viewed 214 times ]

Attachment:
File comment: NVIO root opcodes
Opcodes.png
Opcodes.png [ 35.13 KiB | Viewed 214 times ]

_________________
Robert Finch http://www.finitron.ca


Wed Nov 06, 2019 4:12 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Switched the register file from using a 64-entry unified integer / float file to using separate 32 entry register files for integers and floats. Reading up on portioning of register files, having a unified register file was considered a design issue with the 88000. Using separate register files should make it possible to support more register file updates per clock cycle. If integer and float files are separate, then they can both be written in the same clock cycle using only a single write port for each.

Function attributes. Given that the RET instruction may use one of several different link registers, the register to use could be an attribute of the function. The compiler needs to know which register should be linked by a CALL instruction. So that it can generate code referencing the correct register. This needs to be specified in the function prototype. C/C++ has a way of defining function attributes with the __attribute__() keyword.
One thought is to have a keyword like __inline or __interrupt, but for specifying the linkage register. __linkage1, __linkage2 or __linkage3 for instance. If a __linkage1 routine only calls code using __linkage2 or __linkage3 routines, then it may be considered to be a leaf routine.

_________________
Robert Finch http://www.finitron.ca


Fri Nov 08, 2019 4:13 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1299
The multiple link registers seems like an interesting new territory - do you know if this has ever been seen before? (Edit: maybe the 1802, in some sense? In some ways it's too simple to qualify!)


Fri Nov 08, 2019 7:58 am
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 24
Go back to the 50's, use only open subroutines (macros).
Closed subroutines are pain with with all that self modifing code.
I think more effort is needed with effective parameter passing for
subroutines than just speeding up the JMPS. Different link registers
might spill different numbers of parameter registers to and from the stack.


Fri Nov 08, 2019 6:43 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Link registers could also called code address registers. So, it’s like a Harvard architecture with separate data and code addresses.
Quote:
- do you know if this has ever been seen before?
I’m sure it’s been thought of or done before. Some architectures have a jump-and-link instruction that allows any register to be used as the link register. Although typically only a single register is assigned the task, there’s no stopping the usage of multiple registers. I have not seen a compiler / code that makes use of multiple link registers before. Perhaps the overhead of storing /restoring a single link register isn’t great enough to justify the additional complexity required in a compiler.
I can envision that a sophisticated compiler performing lifetime analysis of vars might just make use of multiple link registers.
If functions / methods are defined as private I think the compiler should be able to figure out where it can use multiple link registers. It can find out which methods are leafs relative to other methods. It’s probably easier to have the linkage register specified by a programmer however.

The interrupted instruction pointer, and the instruction pointer are both part of the code address register set. This allows getting at the ip without having to perform a jump operation to perform relative address calculations. A program can use the interrupted instruction pointer to return from a routine using a regular RET instruction, if the machine's state has already been updated appropriately.

_________________
Robert Finch http://www.finitron.ca


Sat Nov 09, 2019 10:03 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Contemplating adding a whole new dimension to the basic design. Separate queues could be used at the head of each functional unit, rather than having one massive queue feeding all functional units of the processing core. Currently, things work okay because there is only a single unified register file, so the outputs of the register file feed the queue directly. However, with the use of multiple register files, the output of each register file would have to be multiplexed into the instruction queue. By using separate queues instead, the amount of multiplexing required would be reduced. The queue for integer operations doesn’t need to have register values from float registers for instance.
An alternative would be to have slots in a single queue entry to hold argument values for each kind of functional unit. For instance, there would be three registers reserved for integer operations and three more registers reserved for floating point operations as a single queue entry. This would result in a lot of empty register slots, but the design would remain simple.

Some wonderment at the utility of eight condition code registers. In a superscalar processor the registers get renamed anyway, so compare and branch sequences are effectively independent of each other even if the same condition register is used. There should be no effect on performance. The AMD / Intel processors get by just fine with a single condition code register.

_________________
Robert Finch http://www.finitron.ca


Sun Nov 10, 2019 3:58 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1299
It feels to me that a single queue would normally allow you only to issue one instruction per clock. Is that the right picture?


Sun Nov 10, 2019 8:23 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Quote:
It feels to me that a single queue would normally allow you only to issue one instruction per clock. Is that the right picture?

No. For the single monster queue, multiple entries (up to three in this case) to the queue are made every clock cycle. And as many instructions are ready to issue, are issued up to the size of the queue, every clock. Provided there are functional units available.

Separate queues mean managing a set of queue pointers for each one. But the queues can be smaller.

*******
There are a lot of unused opcode bits for some instructions. One possibility is to fill these bits with random data by the assembler. The idea being to alter the noise characteristic of the processor / program.

Added support for a constant prefix instruction. The constant prefix allows using constants up to 53 bits. One drawback is the prefix and instruction must be queued in consecutive clock cycles rather than also allowing queuing during the same clock cycle. The current immediate mode instruction contains a bit indicating if there’s a prefix. If the bit is set, then the queue is searched for the previously queued prefix instruction from which bits 21 to 127 of the constant are formed. This may be extended in the future to allow larger constant formation.

Scrapped a good chunk of the architecture tonight in the interest of keeping things simple. I got to thinking about how "simple" some of the eight bit micros were. Gone are the condition and count registers. Branches are now absolute address mode jumps.

I tried to find an 88000 instruction set summary.

_________________
Robert Finch http://www.finitron.ca


Mon Nov 11, 2019 4:50 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 24
8 bit cpu's look impressive since they have 1 data type (the byte) and smple indexing and immedate
adressing modes. Most of the time a 8 bit cpu is fetching instruction parameters and the odd
data byte,so instuction decodeing is often very regular for the first few words of microcode.
Other than a push or trap, the first few words of micro code are
A: pc-> mar,
B: pc=pc+1 read
C: <real decoding>
Classic machines like the PDP 8 simplfied
A : efa -> mar, r/w
B: pc+ ->mar read
RISC machines
A: pc -> mar pc+, reg a b , read
Most other machines tend to have less regular addressing mode
decoding so things tend to slow down the cpu internaly.
All comes down I think to K.I.S.S idea.
Good luck with simplflying the big cpu.


Mon Nov 11, 2019 7:49 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Moved the CHK instruction to be executed as an R3 integer instruction rather than a branch instruction. It was the only instruction under the branch group requiring three register reads. The branch unit now needs only two register reads, simplifying the argument setup for that unit.

Moved back to a 64-entry unified register file. Although history has shown that separate integer and float register files are better for performance, the hardware is a bit less complex with a unified file. Also, this design is likely going into an FPGA or other device with standard cells and is not likely to be implemented with custom logic. The FPGA can provide a 64-entry file at the same speed as a 32-entry file since it’s a single LUT regardless of whether 64 or 32 entries are selected. Read performance should not be affected.

Used up three unused bits in the instruction bundle to indicate breaks between instructions. These bits are used to serialize the queuing of instructions, primarily for the large constant prefixes. It can take four prefix instructions to specify a 128-bit constant. One clock cycle per prefix instruction is used, so it would take five clocks to queue an instruction with a 128-bit constant. It might seem like it affects performance a lot, but it probably doesn’t as this is the rare case.
*****
Studying vector mask registers tonight with the idea of eliminating the special purpose mask registers from the design. The mask registers would require dependency detection logic in the core just like other registers. This uses a fair bit of logic and might increase the size of the register tags. Eliminating the mask registers means using either an integer register or a vector register as a mask.

An issue with using an integer scalar register as a mask register is that the number of elements in a vector register may be quite large. Suppose there were 1024 elements in the vector register, then the integer register would have to be 1024 bits wide. An integer register wouldn’t adapt well to changes in the size of vector registers. This is one reason there is a dedicated mask register in the Cray architecture.

An issue with using a vector register as a mask is setting all the bits in the elements of the vector register in an efficient fashion. And manipulating the mask in a high-speed fashion. It’s undesirable to use hardware loops to access each vector element in order to manipulate the mask.

The author is leaning towards a design that uses an integer scalar register to contain the mask. Although there are issues with this approach it may be workable with this design. Assuming there won’t be more than 128 elements in a vector register. The current design is for 64 elements. Using an integer register allows the existing dependency checking logic to be used. It also allows the full range of integer instructions to manipulate a mask with.

_________________
Robert Finch http://www.finitron.ca


Tue Nov 12, 2019 5:01 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1299
I think I would be asking myself: what kinds of codes use a vector mask? How often is it set up or modified, versus how many times is it used as-is? Do we alternate between two masks, or is there any other high-level observation about mask use? I think the answers might illuminate where the trade-offs are to be had.


Tue Nov 12, 2019 10:18 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Quote:
I think I would be asking myself: what kinds of codes use a vector mask? How often is it set up or modified, versus how many times is it used as-is? Do we alternate between two masks, or is there any other high-level observation about mask use? I think the answers might illuminate where the trade-offs are to be had.
I think maybe vector code isn’t popular enough making it difficult to say what uses mask registers. There seems to be a variation on the number of mask registers and how they are setup by different designers.

Intel Larrabee: 8 mask registers (https://www.cs.indiana.edu/~achauhan/Te ... essors.pdf)

I found this vector processor in Github. I’ve run into it couple of times so I figured I’d post a link. It uses the scalar register set for mask registers. (potentially 32 mask registers).
https://github.com/jbush001/NyuziProces ... uction-Set

RiscV – 1 mask register, vector extension uses vector register v0 and ~v0 for masking.
Cray 1 – 1 mask register.
Cray X1 – 8 mask registers.

I think having more than a single mask register is useful enough to warrant at least two registers. They added more mask registers after the Cray 1 for a reason I suspect.

The number of mask registers may be related to the complexity of expressions being evaluated.

_________________
Robert Finch http://www.finitron.ca


Wed Nov 13, 2019 3:32 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1299
Ah, indeed, the Cray evolution must be saying something. Then again, RISC-V is a very thoroughly thought-out architecture.


Wed Nov 13, 2019 4:51 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 937
Location: Canada
Quote:
Then again, RISC-V is a very thoroughly thought-out architecture.
While it’s very thoroughly thought-out I think it’s hands are tied by the need to keep instructions 32-bit. It seems like it could benefit a lot if only there were a few more bits available. The next available width 48-bits is undesirable. There isn’t room in 32-bits to support some features of a vector instruction like a mask register spec while at the same time having 3R instructions and not using up too much of the opcode space. Contrasted with nvio, the author decided to use a wider instruction set (40-bit) because just a few more than 32-bits would help a lot. But how does on implement just a few more than 32-bits? The extra bits allow specification in the instruction of element sizes / precision, rounding modes and mask registers. nvio seems roomier. It’s stuck with a fixed 41-bit instruction though. Code density suffers.
***********
Got up early this morning and decided to add back in the complexity previously removed. Back are the condition registers, link registers and vector mask registers. In the morning I think I can conquer any level of complexity. At the end of the day I can’t make things simple enough. Now using separate register files for everything. The instruction set is still fluxing around, so I haven’t written a ton of code yet.

When the vector mask register is included, an instruction may require information from up to five register sources (including the mask). A mask acts like a predicate and requires reading the target register in addition to the source registers. 3 sources + 1 target + 1 mask reg all have to be read.
With vector mask registers present in the architecture the author is left wondering if the mask registers could be utilized in other ways. Mask registers support logical operations between them, and a few other operations like population count and find-first-one. Though they are intended to mask vector operations, they don’t have to be used exclusively that way.

_________________
Robert Finch http://www.finitron.ca


Thu Nov 14, 2019 3:45 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 90 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software