AnyCPU - View topic

Page 6 of 9

[ 133 posts ]

Go to page Previous 1 ... 3, 4, 5, 6, 7, 8, 9 Next

nvio

Author	Message
robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Made the condition registers eight bits wide to accommodate an exception flag, and potentially a couple of other flags such as odd and parity. Most integer memory load and ALU instructions set condition register results. (Odd, negative and zero can be set). If an exception occurs during the execution of an instruction, then the exception status flag will be set. This can then be tested by a branch instruction. Note that vector instructions don’t set a condition result register. It’s difficult to see what the meaning would be behind setting a condition register (conditions for which vector element?) Instead the field is used to specify the vector mask register to use. Branch unit instructions also don’t set a condition result. Made the return instruction return to one of two different link addresses based on the exception flag in a condition register. If the exception flag is set, the return is to Lk2 else the return is to Lk1. So, there are really two potential return addresses for a call instruction. One return address is the instruction after the call, the other return address is the exception handler for the code block containing the call. The call instruction implicitly sets Lk1, Lk2 must be set manually at the start of the code block (try block). _________________ Robert Finch http://www.finitron.ca
Tue Nov 05, 2019 3:12 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Added more link (code address) registers to the design for a total of eight. The idea is that if the call depth is known, a separate link register can be used for each call depth without requiring the link register to be saved and restored. For instance, a hex number print routine calls a word print, byte print, then nybble print. Each routine is at a different depth. By using different link register for each routine they don’t need to be saved. A code address register is also used to hold the target address for computed goto’s. The current catch handler address is also stored in a code address register. Also modified the RET instruction to determine the link register to use indirectly from a condition code register (Typically Cr0). This lets software select an alternate return address in cases of exception handlers. A pair of compare instructions (immediate and register) were removed as being redundant with a subtract instruction. Following are instruction formats and root opcode map. This is a snapshot of work in progress. Attachment: File comment: NVIO IFormats page 1 IFormats1.png [ 96.73 KiB \| Viewed 3667 times ] Attachment: File comment: NVIO IFormats page 2 IFormats2.png [ 37.1 KiB \| Viewed 3667 times ] Attachment: File comment: NVIO root opcodes Opcodes.png [ 35.13 KiB \| Viewed 3667 times ] _________________ Robert Finch http://www.finitron.ca
Wed Nov 06, 2019 4:12 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Switched the register file from using a 64-entry unified integer / float file to using separate 32 entry register files for integers and floats. Reading up on portioning of register files, having a unified register file was considered a design issue with the 88000. Using separate register files should make it possible to support more register file updates per clock cycle. If integer and float files are separate, then they can both be written in the same clock cycle using only a single write port for each. Function attributes. Given that the RET instruction may use one of several different link registers, the register to use could be an attribute of the function. The compiler needs to know which register should be linked by a CALL instruction. So that it can generate code referencing the correct register. This needs to be specified in the function prototype. C/C++ has a way of defining function attributes with the __attribute__() keyword. One thought is to have a keyword like __inline or __interrupt, but for specifying the linkage register. __linkage1, __linkage2 or __linkage3 for instance. If a __linkage1 routine only calls code using __linkage2 or __linkage3 routines, then it may be considered to be a leaf routine. _________________ Robert Finch http://www.finitron.ca
Fri Nov 08, 2019 4:13 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1782	Re: nvio The multiple link registers seems like an interesting new territory - do you know if this has ever been seen before? (Edit: maybe the 1802, in some sense? In some ways it's too simple to qualify!)
Fri Nov 08, 2019 7:58 am

oldben Joined: Mon Oct 07, 2019 2:41 am Posts: 592	Re: nvio Go back to the 50's, use only open subroutines (macros). Closed subroutines are pain with with all that self modifing code. I think more effort is needed with effective parameter passing for subroutines than just speeding up the JMPS. Different link registers might spill different numbers of parameter registers to and from the stack.
Fri Nov 08, 2019 6:43 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Link registers could also called code address registers. So, it’s like a Harvard architecture with separate data and code addresses. Quote: - do you know if this has ever been seen before? I’m sure it’s been thought of or done before. Some architectures have a jump-and-link instruction that allows any register to be used as the link register. Although typically only a single register is assigned the task, there’s no stopping the usage of multiple registers. I have not seen a compiler / code that makes use of multiple link registers before. Perhaps the overhead of storing /restoring a single link register isn’t great enough to justify the additional complexity required in a compiler. I can envision that a sophisticated compiler performing lifetime analysis of vars might just make use of multiple link registers. If functions / methods are defined as private I think the compiler should be able to figure out where it can use multiple link registers. It can find out which methods are leafs relative to other methods. It’s probably easier to have the linkage register specified by a programmer however. The interrupted instruction pointer, and the instruction pointer are both part of the code address register set. This allows getting at the ip without having to perform a jump operation to perform relative address calculations. A program can use the interrupted instruction pointer to return from a routine using a regular RET instruction, if the machine's state has already been updated appropriately. _________________ Robert Finch http://www.finitron.ca
Sat Nov 09, 2019 10:03 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Contemplating adding a whole new dimension to the basic design. Separate queues could be used at the head of each functional unit, rather than having one massive queue feeding all functional units of the processing core. Currently, things work okay because there is only a single unified register file, so the outputs of the register file feed the queue directly. However, with the use of multiple register files, the output of each register file would have to be multiplexed into the instruction queue. By using separate queues instead, the amount of multiplexing required would be reduced. The queue for integer operations doesn’t need to have register values from float registers for instance. An alternative would be to have slots in a single queue entry to hold argument values for each kind of functional unit. For instance, there would be three registers reserved for integer operations and three more registers reserved for floating point operations as a single queue entry. This would result in a lot of empty register slots, but the design would remain simple. Some wonderment at the utility of eight condition code registers. In a superscalar processor the registers get renamed anyway, so compare and branch sequences are effectively independent of each other even if the same condition register is used. There should be no effect on performance. The AMD / Intel processors get by just fine with a single condition code register. _________________ Robert Finch http://www.finitron.ca
Sun Nov 10, 2019 3:58 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1782	Re: nvio It feels to me that a single queue would normally allow you only to issue one instruction per clock. Is that the right picture?
Sun Nov 10, 2019 8:23 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Quote: It feels to me that a single queue would normally allow you only to issue one instruction per clock. Is that the right picture? No. For the single monster queue, multiple entries (up to three in this case) to the queue are made every clock cycle. And as many instructions are ready to issue, are issued up to the size of the queue, every clock. Provided there are functional units available. Separate queues mean managing a set of queue pointers for each one. But the queues can be smaller. ******* There are a lot of unused opcode bits for some instructions. One possibility is to fill these bits with random data by the assembler. The idea being to alter the noise characteristic of the processor / program. Added support for a constant prefix instruction. The constant prefix allows using constants up to 53 bits. One drawback is the prefix and instruction must be queued in consecutive clock cycles rather than also allowing queuing during the same clock cycle. The current immediate mode instruction contains a bit indicating if there’s a prefix. If the bit is set, then the queue is searched for the previously queued prefix instruction from which bits 21 to 127 of the constant are formed. This may be extended in the future to allow larger constant formation. Scrapped a good chunk of the architecture tonight in the interest of keeping things simple. I got to thinking about how "simple" some of the eight bit micros were. Gone are the condition and count registers. Branches are now absolute address mode jumps. I tried to find an 88000 instruction set summary. _________________ Robert Finch http://www.finitron.ca
Mon Nov 11, 2019 4:50 am

oldben Joined: Mon Oct 07, 2019 2:41 am Posts: 592	Re: nvio 8 bit cpu's look impressive since they have 1 data type (the byte) and smple indexing and immedate adressing modes. Most of the time a 8 bit cpu is fetching instruction parameters and the odd data byte,so instuction decodeing is often very regular for the first few words of microcode. Other than a push or trap, the first few words of micro code are A: pc-> mar, B: pc=pc+1 read C: <real decoding> Classic machines like the PDP 8 simplfied A : efa -> mar, r/w B: pc+ ->mar read RISC machines A: pc -> mar pc+, reg a b , read Most other machines tend to have less regular addressing mode decoding so things tend to slow down the cpu internaly. All comes down I think to K.I.S.S idea. Good luck with simplflying the big cpu.
Mon Nov 11, 2019 7:49 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Moved the CHK instruction to be executed as an R3 integer instruction rather than a branch instruction. It was the only instruction under the branch group requiring three register reads. The branch unit now needs only two register reads, simplifying the argument setup for that unit. Moved back to a 64-entry unified register file. Although history has shown that separate integer and float register files are better for performance, the hardware is a bit less complex with a unified file. Also, this design is likely going into an FPGA or other device with standard cells and is not likely to be implemented with custom logic. The FPGA can provide a 64-entry file at the same speed as a 32-entry file since it’s a single LUT regardless of whether 64 or 32 entries are selected. Read performance should not be affected. Used up three unused bits in the instruction bundle to indicate breaks between instructions. These bits are used to serialize the queuing of instructions, primarily for the large constant prefixes. It can take four prefix instructions to specify a 128-bit constant. One clock cycle per prefix instruction is used, so it would take five clocks to queue an instruction with a 128-bit constant. It might seem like it affects performance a lot, but it probably doesn’t as this is the rare case. ***** Studying vector mask registers tonight with the idea of eliminating the special purpose mask registers from the design. The mask registers would require dependency detection logic in the core just like other registers. This uses a fair bit of logic and might increase the size of the register tags. Eliminating the mask registers means using either an integer register or a vector register as a mask. An issue with using an integer scalar register as a mask register is that the number of elements in a vector register may be quite large. Suppose there were 1024 elements in the vector register, then the integer register would have to be 1024 bits wide. An integer register wouldn’t adapt well to changes in the size of vector registers. This is one reason there is a dedicated mask register in the Cray architecture. An issue with using a vector register as a mask is setting all the bits in the elements of the vector register in an efficient fashion. And manipulating the mask in a high-speed fashion. It’s undesirable to use hardware loops to access each vector element in order to manipulate the mask. The author is leaning towards a design that uses an integer scalar register to contain the mask. Although there are issues with this approach it may be workable with this design. Assuming there won’t be more than 128 elements in a vector register. The current design is for 64 elements. Using an integer register allows the existing dependency checking logic to be used. It also allows the full range of integer instructions to manipulate a mask with. _________________ Robert Finch http://www.finitron.ca
Tue Nov 12, 2019 5:01 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1782	Re: nvio I think I would be asking myself: what kinds of codes use a vector mask? How often is it set up or modified, versus how many times is it used as-is? Do we alternate between two masks, or is there any other high-level observation about mask use? I think the answers might illuminate where the trade-offs are to be had.
Tue Nov 12, 2019 10:18 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Quote: I think I would be asking myself: what kinds of codes use a vector mask? How often is it set up or modified, versus how many times is it used as-is? Do we alternate between two masks, or is there any other high-level observation about mask use? I think the answers might illuminate where the trade-offs are to be had. I think maybe vector code isn’t popular enough making it difficult to say what uses mask registers. There seems to be a variation on the number of mask registers and how they are setup by different designers. Intel Larrabee: 8 mask registers (https://www.cs.indiana.edu/~achauhan/Te ... essors.pdf) I found this vector processor in Github. I’ve run into it couple of times so I figured I’d post a link. It uses the scalar register set for mask registers. (potentially 32 mask registers). https://github.com/jbush001/NyuziProces ... uction-Set RiscV – 1 mask register, vector extension uses vector register v0 and ~v0 for masking. Cray 1 – 1 mask register. Cray X1 – 8 mask registers. I think having more than a single mask register is useful enough to warrant at least two registers. They added more mask registers after the Cray 1 for a reason I suspect. The number of mask registers may be related to the complexity of expressions being evaluated. _________________ Robert Finch http://www.finitron.ca
Wed Nov 13, 2019 3:32 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1782	Re: nvio Ah, indeed, the Cray evolution must be saying something. Then again, RISC-V is a very thoroughly thought-out architecture.
Wed Nov 13, 2019 4:51 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2095 Location: Canada	Re: nvio Quote: Then again, RISC-V is a very thoroughly thought-out architecture. While it’s very thoroughly thought-out I think it’s hands are tied by the need to keep instructions 32-bit. It seems like it could benefit a lot if only there were a few more bits available. The next available width 48-bits is undesirable. There isn’t room in 32-bits to support some features of a vector instruction like a mask register spec while at the same time having 3R instructions and not using up too much of the opcode space. Contrasted with nvio, the author decided to use a wider instruction set (40-bit) because just a few more than 32-bits would help a lot. But how does on implement just a few more than 32-bits? The extra bits allow specification in the instruction of element sizes / precision, rounding modes and mask registers. nvio seems roomier. It’s stuck with a fixed 41-bit instruction though. Code density suffers. *********** Got up early this morning and decided to add back in the complexity previously removed. Back are the condition registers, link registers and vector mask registers. In the morning I think I can conquer any level of complexity. At the end of the day I can’t make things simple enough. Now using separate register files for everything. The instruction set is still fluxing around, so I haven’t written a ton of code yet. When the vector mask register is included, an instruction may require information from up to five register sources (including the mask). A mask acts like a predicate and requires reading the target register in addition to the source registers. 3 sources + 1 target + 1 mask reg all have to be read. With vector mask registers present in the architecture the author is left wondering if the mask registers could be utilized in other ways. Mask registers support logical operations between them, and a few other operations like population count and find-first-one. Though they are intended to mask vector operations, they don’t have to be used exclusively that way. _________________ Robert Finch http://www.finitron.ca
Thu Nov 14, 2019 3:45 am
Display posts from previous: Sort by

Page 6 of 9

[ 133 posts ]

Go to page Previous 1 ... 3, 4, 5, 6, 7, 8, 9 Next

nvio

Who is online