View unanswered posts | View active topics It is currently Thu Sep 19, 2019 2:42 am



Reply to topic  [ 203 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14  Next
 74xx based CPU (yet another) 
Author Message

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
Another approach, similar to that suggested by Rob Finch, is to adopt an approach similar to that used by the transputer. The transputer did not allow interrupts or preemptions of running processes except at those points in the execution stream where preemption would not require the flushing of the processor's register stack. Essentially, preemption was only allowed when an assignment to a variable or a branch in the execution stream was made. At this point, the contents of the ALU register stack had been fully used and could be overwritten without first saving it to the stack on an event generated by an external interrupt request, a communications channel request, or a process preemption timer timeout. (I am sure that it is a bit more complicated than I've described here, but I think I've otherwise captured the gist of the concept fairly accurately.)

_________________
Michael A.


Fri Sep 06, 2019 2:06 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
joanlluch wrote:
I can see that prefixed instructions will indeed cause processor malfunction if interrupted in the middle. But, I have currently no idea about how hard or difficult is to solve that. I suspect it will be a matter of explicitly detecting such instructions and avoid interrupts around them. I just hope this doesn't complicate hardware too much. I have still to learn everything about this particular subject, so that's why I naively introduced changes without even thinking on it. Any pointers would be appreciated.

I don't think it should be difficult: if and when you support interrupts, you would most likely have a control register bit to disable interrupts, and that same mechanism would only need to disable interrupts for the one instruction after a prefix. A couple of bits of state at most, I think. Of course once you have interrupts, there's scope for much more extensive test-benching to ensure that every instruction works as it should when interrupted - in a machine with only a few types of instructions that's not so bad. It's a huge pain for a CISC.

Quote:
On the other hand, I wonder what the arm-thumb does with the "long branch with link". It can theoretically be interrupted in the middle of the two halves, I suppose. But this instruction seems to use the "Link register" (LR) as the temporary storage for the address formation, so maybe the instruction can be interrupted just fine without issues because that's one of the interrupt saved registers(?). That's a question for which I do not currently have a clear answer.

Ah, that's quite clever of ARM Thumb. When you have a link register, it's more or less a scratch register except when in use for subroutine calling. And with ARM's idea of having an interrupt context which swaps out a number of registers (reducing or eliminating the spill/fill of an ISR) the LR is safe.

I feel it would be very similar, if you have a dedicated SP and if you had a TOS register. Maybe not... if you have a stack, you can stack the state of the machine (something 68k did at great expense and some incompatibility) and that could include the prefix-in-progress. If you had a TOS, I was thinking perhaps you could avoid the stacking action, but on reflection I'm not at all sure.

Another possible approach might be to have a shadow copy of PC which trails: so that a prefix instruction which is interrupted will in effect be re-executed when the ISR returns. This might be difficult to get right.

I think it's simplest if prefix instructions are welded to their successors.

Quote:
So maybe, instead of arranging for explicitly avoiding interrupts after the prefix instruction, another approach would be to save (in microcode) the Prefix Register as part of the interruption routine? Would that be excessive overheat?

It might work out as being simple, especially if your status register is narrow enough to accomodate the whole prefix value.

Quote:
About your second question on consecutive prefix instructions. With the available width of the immediate field, there will be no need for chaining them by the compiler. However, (excluding maybe interrupt issues) the presence of several consecutive ones would just cause the cpu to execute them in a row with effects only from the last one. That is, storing the 6 bit shifted immediate field in the Prefix Register is independent of any previous prefixes, with every new prefix instruction just overwriting and thus cancelling the effects of the previous one. I mean, the way I plan to implement it, the prefix instruction does not have a cumulative effect on the Prefix Register, it just sets the register with the shifted value of the immediate field.

Indeed, I wasn't thinking that consecutive prefixes would be normal or expected, just that they are a peculiar case. If, for example, a program jumps into the weeds where memory happens to have a million consecutive prefix instructions, it might then be uninterruptible until it gets to a non-prefix word.


Fri Sep 06, 2019 8:31 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
Hi Michael and Ed,

Based on the way I can think about it, I believe that Data Stack based machines are probably more difficult to support interrupts or multitasking than register based ones like my processor. Register based machines can get their state preserved by just saving and restoring those registers, I believe.

I think that the easiest approach should be disabling interrupts for the instruction following the prefix instruction, i.e. preventing interrupts to trigger just after the prefix. In fact, single instructions requiring several microcode steps (cycles) should not be interrupted until completion of the final step, so I suppose it's a case of extending this kind of behaviour for the particular case of the prefix instruction.

The appearance of many consecutive prefixes can be considered as 'undefined behavior' specially regarding interrupts. It must be considered that the prefix instruction alone is not an actual instruction from the point of view of the assembler and compiler. Prefixed instructions are just 32 bit long instructions, the prefixes themselves do not exist as individual instructions. Therefore there's technically no way for the user to place prefixes alone. To illustrate what I mean, this the definition of the "Type I2" instruction pattern in the compiler:
Code:
// Type I2
class TypeI2 < bits<5> opcode, dag outs, dag ins, string asmstr, list<dag> pattern>
      : Instruction16< outs, ins, asmstr, pattern>
{
  bits<8> k;    // unbound var
  bits<3> rd;   // unbound var
  let Inst{15-11} = opcode;
  let Inst{10-3} = k;
  let Inst{2-0} = rd;
}

// Type prefixed I2
class TypeP0I2 < bits<5> opcode, dag outs, dag ins, string asmstr, list<dag> pattern>
      : Instruction32< outs, ins, asmstr, pattern>
{
  bits<16> K;   // unbound var
  bits<3> rd;   // unbound var
  let Inst{31-27} = 0b11110;
  let Inst{26-16} = K{15-5};
  let Inst{15-11} = opcode;
  let Inst{10-8} = 0b000;
  let Inst{7-3} = K{4-0};
  let Inst{2-0} = rd;
}

The I2 pattern is made of two classes, the first class defines a 16 bit instruction with a 8 bit immediate field, labeled 'k', in bits 10-3.
The second class defines a 32 bit instruction with a 16 bit immediate field, labeled 'K', that is physically split between bits 26-16 and 7-3.
Of course, the second class refers to the prefixed version of the instructions of the first class, and the instruction bits 31-16 are technically the prefix. But the compiler doesn't even have a notion of such prefix, but just the existence of some 32 bit instructions.

The same approach will be taken on the assembler. Prefixes simply won't show anywhere except for the length of the immediate fields. For example on the following code:
Code:
mov  200, r0     // This is a 'normal' instruction taking 1 cycle and 1 word
mov  400L, r1    // This is a 'prefixed' instruction taking 2 cycles and 2 words

So the point I want to make is that prefixes only exist for the effects of CPU decoding and execution, but are not explicitly there in the assembler and compiler. This implies, that in practice, the chances of physically having consecutive prefixes are very little, unless they are placed on purpose by hard coding the machine codes. So, I suppose that it's totally fine to consider this situation as undefined behaviour. [As a matter of fun: 1 million prefixes at 1 MHz would take 1 second to execute, or just 0.1 seconds at 10 MHz. But program memory on this system is maximum 64K instructions, so less than 100 thousand prefixes would fit in memory, which would get executed in no noticeable time (although of course a non insignificant number of interrupts would get missed)].

About machine state preservation during interruptions, my current idea is to push all registers including the status register on the stack, and to pop them upon interrupt return. It's possibly not the fastest possible approach, but that's to be implemented with 74xx chips, so I guess it's a good balance among speed and complexity. Provided that instructions (and prefixed instructions) won't be interrupted until completion I think that this is conceptually safe, and nothing could happen regardless of what any given instruction could do (please correct me if I'm wrong), so I probably do not need to test interrupts for all instructions one by one . With this approach, I shouldn't even need to save the Prefix Register, or the flag indicating that a prefix is in effect, because that flag will be always cleared.


Fri Sep 06, 2019 9:54 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
I got the compiler upgraded and working with the new instruction set including the prefixed instructions. The more flexible isa allowed for many simplifications in the compiler and resulted in less code required in some critical functions. The output assembly is also shorter and looks cleaner overall because there's reduced use of intermediate registers.
Not fully tested yet, but I guess I will test it together with the assembler and simulator after I get them upgraded too.


Fri Sep 06, 2019 10:32 pm
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 267
Location: California
joanlluch wrote:
Based on the way I can think about it, I believe that Data Stack based machines are probably more difficult to support interrupts or multitasking than register-based ones like my processor. Register based machines can get their state preserved by just saving and restoring those registers, I believe.

Stack machines have zero overhead (or very close to it) for servicing interrupts. By the very nature of a stack, the context is already saved. There's no need to move anything elsewhere to preserve it. The ISR just builds upon it, and when done, it cleans up after itself, leaving everything as it found it, and the background program doesn't even have to know it was interrupted.

I kept an advertisement from 1991 for the Silicon Composers SC/Fox Cub SBC that used the Harris RTX2000 stack processor. It routinely does 16MIPS at 12MHz, and 50-60MIPS burst. Those are essentially Forth MIPS too, since its machine language basically is Forth. The 12MHz RTX2000 did a 64-point FFT 137 times as fast as the 20MHz 80386 did, and Sieve benchmark 16 times as fast as the 80386. An interrupt took four clocks, while return-from-interrupt took zero.

At http://wilsonminesco.com/0-overhead_Forth_interrupts/, I show how I service interrupts in high-level ITC Forth on a 65c02 with zero overhead. I've been using this method for 30 years. NEXT (the inner loop) actually gets the first instruction of an ISR faster than it would have gotten to the next instruction in line in the background program had there been no interrupt; so in that sense, it's like a negative overhead. The ISR doesn't need to save anything either.

BigEd wrote:
I think it's simplest if prefix instructions are welded to their successors.

I like that approach—make it so the prefix is part of the instruction, so an interrupt cannot cut in and separate the prefix from the rest of the instruction.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Sat Sep 07, 2019 1:11 am
Profile WWW
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
Hi Garth,
I have read your documents, and there's certainly a lot to learn from them. My cpu is not a stack based machine, but a more conventional one with a number of registers that must be preserved, so I suppose there's no other choice than to accept some interrupt processing overheat. Some processors can switch to a 'supervisor mode' or other modes that use separate sets of registers, so there's just a very little overhead of context switching for them, but that's not my case. My processor is architecturally more similar to the 8 bit processors of the 80's, as it just have a dedicated Stack Pointer register.

The cpu74 will be no different than the 6502 regarding interrupts, except that it has potentially more registers to preserve on interrupt routines, so that indeed means increased interrupt setup overheat. However, for a number of reasons, the cpu74 should be much faster than the 6502 (and possibly any microprocessor of the early 80's) at the same clock rate, I think more than twice as fast, so that should help.

Regarding actual interrupt performance, of course the relative loss due to processor state preservation is highly dependent on the total time that interrupt routines take to end. I placed that on a spreadsheet and this is what I found:
Attachment:
Screen Shot 2019-09-07 at 09.53.37.png
Screen Shot 2019-09-07 at 09.53.37.png [ 35.77 KiB | Viewed 261 times ]

The percentage figures show the relative time the CPU is spending on interrupts with respect to the total time.
- I assumed a CPU clock frequency of 10 MHz, and some particular interrupt routine taking 200 cycles for useful work, (i.e excluding state preservation).
- Columns represent the interrupt frequencies of: 100Hz, 1,000Hz and 10,000 Hz.
- Rows represent different theoretical number of cycles required for state preservation, ranging from 0 to 30 cycles.

So the spreadsheet shows that there's in fact not such a big penalty (based on the simulated figures) that would be highly attributable to the state preservation time. This can be seen because the figures on the first row, are not that much better than the figures on the last row.

Of course the higher the interrupt frequency and the shorter the interrupt routine, the greater is the negative influence of the state preservation time on overall performance. However, the state preservation code is not constant time, as it can depend on the actual interrupt routine. It's reasonable to expect that very short interrupt routines will use a few number of registers, thus the state preservation code will be comparatively cheaper for short interrupts, than the code required for more complex interrupts, thus further minimising the effects of the state preservation time itself.


Sat Sep 07, 2019 7:54 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
joanlluch wrote:
...The more flexible isa allowed for many simplifications in the compiler and resulted in less code required in some critical functions. The output assembly is also shorter and looks cleaner overall...

A great result!

Just one point, although I don't mean to re-open something which is already settled: an ARM-like idea for shadow registers (or even a z80-like idea) need not increase the hardware complexity too much, if you have a suitable implementation of a register file. You 'just' make the physical register file bigger and adjust the address decode into it to take into account a context bit. (Clearly, a single alternate context has the property of fast switching but doesn't immediately support nested interrupts. That's a tradeoff.)


Sat Sep 07, 2019 9:32 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
BigEd wrote:
Just one point, although I don't mean to re-open something which is already settled: an ARM-like idea for shadow registers (or even a z80-like idea) need not increase the hardware complexity too much, if you have a suitable implementation of a register file. You 'just' make the physical register file bigger and adjust the address decode into it to take into account a context bit. (Clearly, a single alternate context has the property of fast switching but doesn't immediately support nested interrupts. That's a tradeoff.)

That's definitely something to consider. I didn't know that the Z80 implemented something like that, I will search that.
I will decide about that at a later stage when I understand better the interruption mechanism and working on the actual hardware.

From the point of view of the compiler I think that almost nothing needs to be done. The compiler just doesn't even need to know about context switching. I think it's just a matter of telling it to not save and restore any registers upon entry and exit of interrupt functions. Look at this example code:
Code:
volatile int foo;
volatile int bar;
__attribute__((interrupt(0)))
void theInterruptRoutine(void)
{
  foo = bar;
}

The attribute (interrupt) tells the compiler that this is an interrupt routine. The above code gets compiled into this:
CPU74
Code:
   .globl   theInterruptRoutine
theInterruptRoutine:
   push   r0
   ld.w   [&bar], r0
   st.w   r0, [&foo]
   pop   r0
   reti


There are essentially two differences with respect to a normal function:
(1) All registers being used in interrupt routines are saved and restored from the stack. In this case register r0.
(2) The generated return instruction is 'reti', instead of 'ret'

So, getting back to the topic: If I want to have context switching for interrupts in the actual hardware, I believe that just disabling register saves/restores in the compiler for the interrupt routines will do it, and that's just one line of code.


Sat Sep 07, 2019 11:54 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
As a continuation on the subject of context switching for interrupts, I just found a Question & Answer entry on the StackExchange Retrocomputing page with an interesting discussion on the Z80 register usage, including interrupts. I'm unsure about how well informed are some of the answers, but I quote below an except of the 'selected' answer:

Quote:
The context switching does not tend to happen often enough to use the alternative set of registers only for that; the gains are simply not worth it most of the time. Hence, the good practice of Z80 programming is typically about using as many registers as possible and still use stack for saving registers during the interrupts.

Then the same author explains with examples why the alternate set of registers can be used in more efficient ways.

This is the link: https://retrocomputing.stackexchange.com/questions/7794/did-anyone-ever-use-the-extra-set-of-registers-on-the-z80

I don't really know what to comment about that, but I kind of intuitively think that the affirmation above does have some real ground.


Sat Sep 07, 2019 9:04 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
Hmm - it's an interesting question, but I think the answer is very context-sensitive. If the main program has register pressure and/or interrupt latency is not an issue, then the spare registers are best used in main context. But if interrupt latency is important and the main code doesn't have great register pressure, better to use the register banks for an interrupt context. (Several commenters have used the registers for interrupts - thanks for the link.)

This isn't a tradeoff that you can make on ARM in quite the same way, as there really is a context bit (or two) which selects the registers automatically.


Sun Sep 08, 2019 9:27 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
The use of the other register banks for interrupt processing comes with its own set of limitations: a limited number of register banks. Limited resources for any particular function, whether interrupts or process contexts, eventually causes the need to spill those registers to main memory. Unless the application has limited resources needs, like only one interrupt context or one co-routine, register banks have never proven to be very useful to me.

The four register banks in the 8051 architecture can certainly be used to keep particular process and interrupt contexts. I used them in that way once or twice. However, I invariably returned to using a single register bank because I needed more of the internal RAM to which the register banks were mapped for variables, or I had nested interrupt routines that I needed to support.

In a small embedded application, the register banks can be used as suggested above. They will certainly improve latency. However, as one of the uses described in the Z80 response suggests, it would have been better to provide more registers. More registers requires more select bits in the opcode. Given the small number of opcode bits, 8 for the Z80/8080/8085, 8051, etc., the number of general purpose registers provided is about the maximum that can be provided and still have enough opcode space to support a reasonable number of addressing modes and operations.

Although the transputer uses a three level stack for its ALU, it is not a stack machine in the same sense as a Forth machine like the Harris RTX2000 processors. The transputer does have a stack in external memory, the workspace. Because of its unique instruction encoding scheme, and only two addressing modes, the first 16 locations in the workspace can be addressed relative to the workspace pointer in a single 8-bit opcode. Thus, these 16 locations can be thought of as 16 direct registers local to each process or function. This organization of the workspace is somewhat like that of the Texas Instruments TMS9900 microprocessor, itself an single-chip implementation of the Texas Instruments 990 minicomputer.

A better example of a stack machine is the Burroughs B5000 mainframe. That machine had a substantial amount of logic to support it's stack-oriented ALU processing paradigm; spilling and refilling of the stack cache registers was automatically performed by the processors control logic. The single memory space supported by a revolutionary virtual memory system in the early 1960s. I think of the B5000 and its derivatives as one of the longest lived instruction set architectures after the IBM 360 instruction set architecture.

Virtually every major computer architecture feature found in modern processors can be found to have been implemented in one way or another in the first few generations of computers. Nearly every time I find a new processor from the 60s, 70s, or even early 80s, that I was not previously aware of, I find some new concept to study. It's amazing the amount great thought and innovation that those early computer architects and designers exhibited. Modern processors are marvels themselves, but their insides are not as exposed as some of the insides of those older machines.

_________________
Michael A.


Mon Sep 09, 2019 12:25 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
> Virtually every major computer architecture feature found in modern processors can be found to have been implemented in one way or another in the first few generations of computers. Nearly every time I find a new processor from the 60s, 70s, or even early 80s, that I was not previously aware of, I find some new concept to study. It's amazing the amount great thought and innovation that those early computer architects and designers exhibited.

Yes, yes, yes!


Mon Sep 09, 2019 9:08 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
I'm not totally aware of the processors of the early times, but I agree that the rate of innovation on these times was impressive. It occurs to me that when something new is invented, it's because it was made to happen by people who also create the most innovation in the shortest period of time. Once the technology matures I suppose that innovation is just incremental until something relatively unrelated is invented, so the cycle can restart again.


Wed Sep 11, 2019 9:46 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
Bret Victor gave a good talk (and an amusing one) a few years ago:
https://retrocomputingforum.com/t/bret- ... -video/681


Wed Sep 11, 2019 10:01 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 165
Location: Girona-Catalonia
I've been updating the Assembler to work with the latest instruction set and after solving a couple of difficulties I added some new features.

One interesting aspect of "prefixes" is that all instructions with immediate fields can benefit from them directly. This includes arithmetic instructions, relative jumps, and load/stores with relative offsets, but also calls to absolute addresses and direct address load/stores. A way to see this, is that non-prefixed, absolute address instructions, can be regarded as 'zero' page access instructions, of the size of their embedded immediate field. This is the latest tweak of the instruction set adjusted for that:

Attachment:
CPU74InstrSetV8.png
CPU74InstrSetV8.png [ 260.92 KiB | Viewed 161 times ]


The 'call' instruction is now on the "P" type. It has 11 immediate bits, which means that it can reach up to 4 K addresses in low program memory without being prefixed (the 11 bit immediate represents word sized addresses). This will be useful for eventual fast subroutine access to system code such as arithmetic routines residing in low memory addresses.

Similarly, the immediate address load/store instructions of the I2 type, have a 8 bit immediate field without prefixes that can reach up to 256 bytes in low data memory, so maybe this memory area can be reserved for commonly accessed global variables of some kind.

So the current instruction set has a total of 57 'core' instructions which in some cases can be 'prefixed' to extend the range of their immediate fields. The following document is a brief listing of the available instructions alone without referring to encodings or other details.

Attachment:
CPU74InstrSetV8Brief.pdf [46.33 KiB]
Downloaded 6 times


Wed Sep 11, 2019 10:46 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 203 posts ]  Go to page Previous  1 ... 10, 11, 12, 13, 14  Next

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software