Last visit was: Mon Dec 09, 2024 7:13 am
|
It is currently Mon Dec 09, 2024 7:13 am
|
RISC register r0: Any alternative models of operation?
Author |
Message |
NorthWay
Joined: Thu Jan 17, 2013 4:38 pm Posts: 54
|
(Not sure this is the right forum part, but anyway:)
The classic RISC model hardwires register 0 (or perhaps I should call it r0) to 0. This has the advantage of giving the instruction set a garbage dump and to alias (or macro) several instructions into variations of some other opcode.
Are there any risc designs that have deviated from this model? Or other recent cpu designs with plenty of registers? Have there been any designs that hardwire other registers, or change the r0 value depending on which place in the opcode it is used?
It was that last part I was thinking about recently when toying with ideas in my head. Math tells us that there are many operations where the order does not matter, but on the other hand there are some typical computations where it matters absolutely. Thinking about a "mapping" of 68K opcodes down to a risc (which is what I was doing :-): NEG: 0 - value (order dependent) NOT: value XOR -1 (not order dependent)
Would there be any value in trying to let r0 be 0 or -1(probably) depending on where in the opcode it is used? Would it be hard to implement?
|
Thu Jan 31, 2013 11:53 pm |
|
|
Dr Jefyll
Joined: Tue Jan 15, 2013 5:43 am Posts: 189
|
Quote: Thinking about a "mapping" of 68K opcodes down to a risc Hello, NorthWay! I guess you know that Motorola/Freescale has already undertaken something along this line -- the so-called Coldfire instruction set. I suppose it's debatable how "RISC"y Coldfire is, but certainly it's not as "CISC"y as an actual 68k. Perhaps it'll give you some ideas. http://en.wikipedia.org/wiki/Freescale_ColdFireQuote: Have there been any designs that hardwire other registers, or change the r0 value depending on which place in the opcode it is used?
MSP430 processors are definitely worth studying. With this family, R4 through R15 are general-purpose registers but R0 is the Program Counter and R1 is the Stack Pointer. As for R2 and R3, these registers don't exist but their encodings are used with different meanings according to context; they either access the Status Register or else they cause the CPU to generate constants such as -1, 0, 1, 2, 4 and 8. The MSP430 is an education in the power of simplicity! For example the PC and SP also have the same addressing capabilities as the twelve general registers, which leads to some fundamentally useful combinations. http://en.wikipedia.org/wiki/TI_MSP430I found the '430 doc a tad frustrating, and despite (or because of?) the drastic simplicity it took a while to wrap my brain around this CPU. But the effort was well worthwhile. As for the second part of your question, "depending on which place in the opcode it is used" is clearly a useful distinction. For example, R0 as the Source of an operation could mean zero; used as the Destination it could cue an entirely different effect. I'm sure there are many examples, including, as noted, the '430 with its use of pseudo-registers R2 and R3. I like your insight about operations with two sources. Subtraction allows no leeway in the specification of its sources, but operations such as Addition, AND OR and XOR have the commutative property, so swapping the two sources has no effect on what's placed in the destination. For example, r3 + r4 -> r15 is the same as r4 + r3 -> r15. These two instructions have the same result but different encodings -- and that implies a partially wasteful use of bits in the opcode. It'd be nice to eliminate the redundancy and assign a new meaning to one of the encodings, but, as you say, implementation might be a problem. One approach would be to compare the magnitude of the two source operands. If source 1 specifies a higher register number than source 2 then that could have a different meaning than when they're reversed. Although it sounds a bit goofy, it might actually be an acceptable way to eliminate the redundancy and compress more meaning into each opcode! You'd have to evaluate the impact on clock speed, though. There'd probably be a penalty, unless pipelining can somehow hide the delay of the magnitude comparison. cheers, Jeff http://LaughtonElectronics.com
|
Fri Feb 01, 2013 5:16 am |
|
|
NorthWay
Joined: Thu Jan 17, 2013 4:38 pm Posts: 54
|
Dr Jefyll wrote: MSP430 processors Now that was an interesting beast. Tastes a lot like a 68K I'd say. I especially liked the #immediate addressing. I'm just an amateur with an interest in cpu design, but I have been led to believe that in the land of gigahurts it really hurts to drag along PC and SR type registers, and also that multi-stage operations (post/pre-increment type - changing a register value) complicate exception handling (instruction restarting versus intermediate status tracking). Still, I like 68K style code and have toyed around with a few ideas to "resurrect" it in a risc model. This needs expanding the risc model from "two source, one destination" into something like "four(three?) source, two destination". Will need a longer pipeline, and I was thinking that it should be a hazard if the destination registers are the same. The basic idea is for memory load/store operations to store the calculated address to the second destination register. A second destination could have some value for SR generation for register-register operations? "LD #-4,r2,r7 ; SUB #4,r2" (aka 'move.l -(a2),d7') becomes "LD #-4,r2,r8,r7". The new code would have to flip back and forth between using r2 and r8 of course. Hm. Any reason you can't do my LD addressing in a traditional risc?
|
Fri Feb 01, 2013 5:50 pm |
|
|
Dr Jefyll
Joined: Tue Jan 15, 2013 5:43 am Posts: 189
|
Quote: Now that was an interesting beast. Tastes a lot like a 68K I'd say. Yeah.... the MSP430 is 68K-ish in some ways. Both chips have a generous number of registers (13 or 17) as compared to a 6502 or even an x86; and they both use multi-cycle instructions, in contrast to the RISC convention of single-cycle execution. But the 68K design leans toward feature-itis whereas the '430 is just lean! The thing's built for ridiculously low power consumption, and it simply doesn't have very many gates. Or very many instructions to learn. Nevertheless, asm programs for the '430 tend to be short and to the point, assisted by the large register set but also by un-RISCy features such as auto-increment address indexing, and operations on memory (ie, not load-store). Quote: I have been led to believe that in the land of gigahurts it really hurts to drag along PC and SR type registers The land of gigahurts just gets crazier and crazier! With desktop CPUs increasingly hitting the wall in terms of clock speed, now the reliance is shifting to complexity instead. Hence we have stuff like Branch Prediction, Speculative Execution and multiple Execution Units (and multiple CPU cores) -- and pipelines a dozen or more stages deep. Not many get to work on designs of that scope, and it's generally a team effort. But there's always a lot an individual can learn. Sorry if I'm getting OT. The following are two links I thought were excellent:This first document is a wonderfully complete and highly readable roadmap of the decisions you face as you dream up and flesh out your new architecture. By Ken Chapman of Xilinx. http://www.dc.uba.ar/materias/disfpga/2 ... ollers.pdfAnd here are a handful of pdf documents posted by Bruce Jacob of the University of Maryland. I've not been through them all but again the readability is high, despite the fact some of the topic material is non-trivial (eg: out-of-order execution). http://www.eng.umd.edu/~blj/RiSC/Quote: This needs expanding the risc model from "two source, one destination" into something like "four(three?) source, two destination".
The basic idea is for memory load/store operations to store the calculated address to the second destination register. Huh -- starting to sound like a VLIW design! As for saving the calculated address: good idea. And I suspect it has also been independently invented by someone else. There's something at least a little bit like that already out there in the mainstream -- maybe someone reading this can remind me of the details. Quote: Will need a longer pipeline, and I was thinking that it should be a hazard if the destination registers are the same. I guess if it adds a stage to the pipeline that might not be so bad, as long as the total number of stages isn't too insane. And it might solve your problem of two writes to the same register, if half the operation occurred on a later cycle I mean. As for cases where a destination register is written twice in the same cycle, that can actually be handy if the hardware has a defined behavior you can rely on. For example years ago I was speculating about a TTL four-port register file IC, the 74172. IIRC, simultaneous writes from two ports to the same register address would result in storage of the logical OR of the two input words -- this was stated in the data sheet. I found the idea intriguing, since in some cases the OR could be used as a free operation, performed as a bonus on top off whatever your ALU might also have done on that same cycle. cheers, Jeff http://LaughtonElectronics.com
|
Fri Feb 01, 2013 11:18 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2231 Location: Canada
|
Hello,
Some comments:
IBM PowerPC uses R0 as a regular register except for some address formations. I made a design where loads and stores of R0 updated and stored the loop count register instead.
It is possible to create a register file with two write ports, it's just twice as expensive resource wise. three write ports are possible too, at just thrice the price.
I'm under the impression that auto-inc/auto dec addressing is hard for a compiler to use. Mosst of the time the compiler ends up not using it, when it could. Some newer processors like the MMIX just use add+shift as a single cycle operation, in order to do index scaling.
68000's one of my favorites.
_________________Robert Finch http://www.finitron.ca
|
Mon Feb 04, 2013 1:25 am |
|
|
NorthWay
Joined: Thu Jan 17, 2013 4:38 pm Posts: 54
|
robfinch wrote: It is possible to create a register file with two write ports, it's just twice as expensive resource wise. Define expensive in laymans terms please? Just more gates, or slower execution?
|
Tue Feb 05, 2013 4:13 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1807
|
Thanks for those two pointers Jeff - good reading! Cheers Ed
|
Tue Feb 05, 2013 7:07 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2231 Location: Canada
|
It's more than double the gates to support a two- simultaneous write port register file vs a single port register file. One way to implement it is to use two register files plus a tag bit for each register. The tag bit indicate which register file contains the valid copy of data. Both register files are updated simultaneously with values, then the tag bits are set for the registers. The tag memory is dual ported as well. The same idea works for triple ported register files as well. Pseudo code follows.
if (wr0) regs0 <= data0 -- update register file 0 with port 0 data if (wr1) regs1 <= data1 -- update register file 1 with port 1 data
if (wr0 and wr1 and write address 1 = write address 0) tag[write address 1] <= true -- same register, newest value is valid else if (wr0 and wr1) tag[write address 0] <= true tag[write address 1] <= true else if (wr0) tag[write address 0] <= true else if (wr1) tag[write address 1] <= true
-- readback output 0 = tag[read address 0] = true then regs1 else regs0 output 1 = tag[read address 1] = true then regs1 else regs0
_________________Robert Finch http://www.finitron.ca
|
Fri Feb 08, 2013 12:10 am |
|
|
TMorita
Joined: Wed Jul 24, 2013 10:17 pm Posts: 3
|
Renesas SH uses r0 as the only register usable for a two-register addressing mode.
e.g. mov.l @(r0,r1),r2 is equivalent to r2 = *(r0 + r1) ignoring data width.
Only r0 is usable as the second register in that addressing mode.
There are hardwired registers in many RISC processors.
The original MIPS processor has a MAC register which is the result of a multiply operation. Renesas SH has MACL, MACH which are the results of a multiply operation.
Many RISC architectures have a hardwired ALU flags register usually called a condition code register or status register. This includes PowerPC, ARM, and SH.
There's probably other examples.
Toshi
|
Wed Jul 24, 2013 10:23 pm |
|
Who is online |
Users browsing this forum: CCBot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|