View unanswered posts | View active topics It is currently Fri Mar 29, 2024 12:30 pm



Reply to topic  [ 305 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12 ... 21  Next
 74xx based CPU (yet another) 
Author Message
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Hi Ed, I’m not totally sure about what you refer by “dual stack”. I think you may refer to processor modes where different register banks may be used, for example to implement multithreading and fast context switching. But I suppose I will not go that far, and I think the PC should also be included in the general register set for that, which I don’t have. (not sure about the latter though...).

Although the SP is now part of the general register file, there are still some instructions that use it as a specialised register, such as “call” and “ret”. Also there’s not a ‘pre decrement’ or ‘post increment’ general addressing mode, so the ‘push’ and ‘pop’ instructions could not be removed from the previous set. However, all the remaining addressing modes and interesting ALU operations can be used on the SP now, which required special instructions before.

About jump tables, I understand that the trick on the 6502 is pushing the destination address onto the stack and executing a return instruction. In my case, I have specific instructions for that. If you look at the T9 pattern, you’ll see the “jmp rd” instruction that precisely performs a jump to the address given by some register. So I can load the register with the address based on a table and then execute that instruction. The compiler uses that to implement optimised versions of the ‘switch’ statement. The same exists for the “call” instruction, so for example, an user program can implement an array of function pointers and call a particular one based on some conditions. The compiler will ultimately use the “call rd” instruction to compile that.

Joan


Fri Aug 02, 2019 6:39 am
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Joan:

I think that Ed is referring to a dual stack architecture like that usually used with Forth.

I suppose it appeared that your last revision of your ISA supported the use of your general purpose registers in a manner that allowed more than one stack to be created. Multiple stacks, whether you're implementing Forth or not, can provide some useful capabilities for general programming problems.

_________________
Michael A.


Fri Aug 02, 2019 10:37 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Hi Michael,

Ah, thanks for the clarification. I mostly intent to implement an instruction set that is C-compiler friendly. But I’m sure that the Forth language can be implemented as well with the proposed architecture, just maybe not in a purely native way. The current SP register is specialised for subroutine calls and returns, as well as storing/retrieving callee saved register on the stack in the usual way that a C compiler will manage.

To implement a Forth data stack using a general register other than the SP, I suppose I would have to use explicit decrement/increments as necessary to grow and shrink the stack, but I think this should be still relatively good because all the stack access would be still done by means of the available indirect addressing modes, which should be efficient.

Many years ago I had a brief introduction to Forth and made some simple programs with it, but I’m not currently totally aware of what’s required for a totally efficient implementation. Still, Forth is an interesting language to support on my finished processor, and definitely something that I will consider as more of my initial goals are met.

Joan.


Sat Aug 03, 2019 6:25 am
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Joan:

Forth as a programming language can be a bit difficult. I find it difficult to adapt to some of its control structures after many years of using those of C/C++ and Pascal. I suppose I've not tried sufficiently long or hard enough to master the basics. However, from a different perspective, I find that its use of threaded code to be interesting, particularly indirect threaded code. I've attached a paper by James Bell that describes the use of threaded code on a PDP11 processor. Apparently, the concept of threaded code was used to build the initial Fortran IV compiler offered by Digital Equipment Corporation for the PDP11/20 minicomputer.

In either case, Forth or a threaded code program, the PDP11's deferred register indirect with post-increment addressing mode, @(R++), was used to implement indirect threaded code. In my processor design, I opted to implement a "Forth VM" within the core. Essentially, that logic provides only one register capable of performing the deferred register indirect with post-increment addressing mode of the PDP11.

Unlike the PDP11, I could not easily apply that addressing mode to the existing instructions, so I defined a small number of opcodes to support the "Forth VM" itself using some of the single byte undefined opcodes in the 6502/65C02 instruction set. I also thought that having that particular addressing mode may prove useful in general, so I also defined 16 other instructions (a combination of all of the basic instructions of the 6502/65C02).

Over time, I have found that my original idea that those 16 instructions would be useful to be misguided. In fact, although I've ported a Forth system, that port did not find many opportunities to use those additional instructions. However, I've been able to use the lda/sta (ip,I++) instructions effectively in my Pascal compiler code generator. I think the basic register indirect operation is the more common requirement for my compiler and other programming needs. The auto-increment feature of that particular addressing mode is not as useful. It appears that you've come to that similar conclusion.

If the performance penalty of using threaded code as discussed in the Bell paper is not too great and acceptable, the resulting reduction in the executable image can be substantial. Thinking back on the Fortran IV programs written for the PDP11 of that era, I am surprised at the amount of computation that could be packed into such a limited code space: 24kW. I have not had time to modify my Pascal compiler's code generator to utilize threaded code for basic operations like loading, storing, and type conversions, as discussed in the paper. There is a lot of code space used to perform these basic operations before and after each function/subrouting call.

Threading the code may result in significant code reductions. Given that most embedded systems today are not as code constrained as those in the past, packing a program into significantly less space is not likely a priority for most systems today. I continue to think about this problem because I tend to work with FPGAs and their internal RAM is very limited. This concept still holds some promise for that application domain, and it does not seem to be too difficult to implement in practice.

In the final analysis, supporting register indirect, (R), and possibly the deferred register indirect, @(R), addressing modes in an ISA is a win-win. The auto-increment and auto-decrement capabilities of the PDP11, useful as they are for stacks, are not as useful in a general programming sense.


Attachments:
File comment: James Bell - Threaded Code (PDP11 Fortran)
Forth_Bell.pdf [283.19 KiB]
Downloaded 134 times

_________________
Michael A.
Sat Aug 03, 2019 1:45 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Thanks Michael - yes, it was Forth I was thinking of. One way to think of it is that an inner interpreter (perhaps a Tinybasic or VTL) would be an interesting target, to see how neatly the instruction set fits the need. Of course, anything will be possible, but on some machines it will come out neatly.

On the Callback Tables thread, the key is perhaps the wish to have this facility:
> JSL (JumpTable, X)
so there's a table of addresses somewhere, and you'd like to index into them and use the indexed one as the address to call as a subroutine. One possibility is to push things on the stack and return, but that's only one possible way to do it. It's an indexed dispatch table, but rather than being a computed GOTO it's a computed GOSUB.


Sat Aug 03, 2019 4:55 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Hi Michael,
Thanks for your explanation and insights. The reading of the document on “threaded code” was also interesting. The way the PDP/11 and VAX/11 instructions could be (ab)used in unusual ways was totally amazing. In this case they apparently placed a “jmp @(R)+ instruction at the end of each routine so to cause a jump to the next routine in the list. In my case, to achieve the same, I would have to use 3 instructions and a temporary register. Assuming I am using r6 as the pointer and r0 as the temporary, the actual code would look like this:
Code:
ld.w  [r6, 0], r0
add.w  r6, 1, r6
jmp  r0

That’s 3 single-word instructions for the cpu74 instead of just a single instruction used for the pdp, so that’s the price for going RISCy, but I would be interesting to compare the number of cpu cycles required on both architectures. It may turn out that performance wise, both come to be very similar. For the cpu74, I expect the firth and third instructions to take 2 cycles each, plus 1 cycle for the second one, so a total of 5 cycles.
MichaelM wrote:
In the final analysis, supporting register indirect, (R), and possibly the deferred register indirect, @(R), addressing modes in an ISA is a win-win. The auto-increment and auto-decrement capabilities of the PDP11, useful as they are for stacks, are not as useful in a general programming sense.
I agree with that, but I decided to avoid deferred addressing modes because although they can save instructions and registers in some cases I think that they can’t really save cpu cycles. I also think that in a number of cases getting the first address in a register and then using simple indirect addressing based on that register is often beneficial. One example is when passing structs by reference: first get the address of the struct from the passed in pointer (in the same register), then use that register as base for struct member access

Joan


Sat Aug 03, 2019 7:55 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
BigEd wrote:
On the Callback Tables thread, the key is perhaps the wish to have this facility:
> JSL (JumpTable, X)
so there's a table of addresses somewhere, and you'd like to index into them and use the indexed one as the address to call as a subroutine. One possibility is to push things on the stack and return, but that's only one possible way to do it. It's an indexed dispatch table, but rather than being a computed GOTO it's a computed GOSUB.

Hi Ed, I am out of my home office until next Wednesday. I think that I already understood that case and I tried to reply to it, but maybe I was not clear enough. When I return home, I will try to create an example of that and will show how that’s compiled. As you suggest, callback tables are useful for efficient implementation of interpreters. The key instruction to do so in my architecture is the ‘call’ with register operand, I.e. the calling address is previously loaded in a register from the callback table and then the call with register operand is executed. This instruction was available already in the very first version of the instruction set, so this case is something that I had in mind from the very beginning.

Joan


Sat Aug 03, 2019 8:15 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Ah, yes, indeed, calling to an address in register would be a good piece for the puzzle.


Sat Aug 03, 2019 8:46 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Joan:

Perhaps I'm missing something about your addressing mode, but it appears that your architecture already supports the deferred register indirect that the PDP/11 jmp @(r4+) instruction represents, with the exception of the register auto-increment operation found in the PDP/11.

If you allowed the operand addressing mode you used for the ld.w [r6,0],r0 instruction for the jmp r0 instruction, then at the top of your subroutine you could perform the increment, add.w r6, 1, r6, and at the bottom you could perform the jump, jmp [r6, 0]. This would save two cycles by your estimate out of the five you proposed above. It might also make the implementation of call back tables and OOP (Object-Oriented Programming) easier to implement.

_________________
Michael A.


Sun Aug 04, 2019 2:25 am
Profile

Joined: Tue Dec 31, 2013 2:01 am
Posts: 116
Location: Sacramento, CA, United States
MichaelM wrote:
The auto-increment and auto-decrement capabilities of the PDP11, useful as they are for stacks, are not as useful in a general programming sense.

... until your pet software creation "needs" a post-increment addressing mode for a JMP instruction operand. Of course there are methods to work around this slight inconvenience, but they're all ... slightly inconvenient. ;-)

Mike B.


Sun Aug 04, 2019 4:21 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
MichaelM wrote:
Joan:

Perhaps I'm missing something about your addressing mode, but it appears that your architecture already supports the deferred register indirect that the PDP/11 jmp @(r4+) instruction represents, with the exception of the register auto-increment operation found in the PDP/11.

If you allowed the operand addressing mode you used for the ld.w [r6,0],r0 instruction for the jmp r0 instruction, then at the top of your subroutine you could perform the increment, add.w r6, 1, r6, and at the bottom you could perform the jump, jmp [r6, 0]. This would save two cycles by your estimate out of the five you proposed above. It might also make the implementation of call back tables and OOP (Object-Oriented Programming) easier to implement.

Hi Michael,

One difference between the pdp/vax architectures and my processor is the extent of instruction orthogonality: The cpu74 architecture is constrained in available instruction encodings because the complete set including its addressing mode variants must be represented in single 16 bit words including in some cases embedded immediate fields. This reduces the possibility to have all possible addressing modes for all types of instructions.

The pdp and vax on the other hand had an encoding schematic that allowed any combination. In this case 1 byte was used to encode the instruction opcode, and the following bytes represented the operands in perfectly regular patterns. I found this on the internet that summarises it.

https://cs.ccsu.edu/~kjell/cs254/ch08/ch8_11.html

In my case, I had to make choices about which instructions supported which addressing modes, and this resulted in a reduced set of options for the ‘jmp’ and ‘call’ instructions. These instructions only support immediate modes, and jmp/call to the address in a register.

The assembly nemonic of the pdp/11 jump with deferred auto-increment instruction
Code:
jmp @(Rn)+
is misleading because it seems to imply a double indirection (as it would be the case of moves or alu operations) but in reality it’s just a call to the address contained in the memory location specified by Rn, in which Rn is then incremented by 4. Given such semantics, it should have been expressed as
Code:
jmp (Rn)+
but that’s anyway just according to my opinion and taste.

The case is that even if we omit the post-increment, such an instruction is not available in the cpu74, for the reasons given above. The available instruction for the purpose of jump tables is
Code:
jmp Rn
. This does not use indirect addressing, so the indirection must be computed in a previous instruction. Similarly, the post-increment must be performed explicitly.

Now, your proposal of incrementing the pointer at the beginning of the routine only works if you adopt semantics that assume the table pointer is one index off the supposed return address. This would work, but it would break the usual convention for memory access using processors supporting pre-decrement and post-increment modes, and therefore potentially disturbing compiler target independent optimisations that may make assumptions on that.

Anyway, thanks for your input and I hope that the above makes some sense.


Last edited by joanlluch on Sun Aug 04, 2019 8:03 pm, edited 1 time in total.



Sun Aug 04, 2019 7:37 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Joan:

Great discussion. Thanks.

All designs are a study in trade offs. One of the self-imposed constraints of your architecture is the single word instruction. That is as valid a constraint as any, and will lead to the restrictions that you discussed above. Regardless, the architecture that you've specified is pretty nice, and certainly provides a nice set of services with which to build a serviceable processor.

In any case, I am looking forward to further development of your architecture, especially its LLVM compiler component.

_________________
Michael A.


Sun Aug 04, 2019 7:55 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
Hi Michael, I incidentally edited my previous post while you was submitting yours, but it was just to add some clarity so the essential content remains.

Thanks for your nice comments. At the beginning I wanted to mimic most of the pdp/11 instruction set, including the flexible encoding length and full orthogonality. But then learned about pipelines and understood in a deeper way the advantages of RISC and particularly pure load/store ISAs, so that’s why I decided to switch to a constant length instruction encoding, and I got my subsequent inspiration from the the ARM-Thumb and MIPS-16 architectures instead; The ARM is a too complex processor to be emulated, and I could just have adopted the much simpler MIPS-16 instruction set as it is http://www.ijsrp.org/research-paper-0413/ijsrp-p16126.pdf. I finally went somewhere in the middle, also by taking into account and incorporating instructions that the compiler really loves, such as ‘setcc’ and ‘selcc’, and I’m very happy about what the compiler is able to do with the final instruction set. The most unexplored aspect at this time is whether the instructions will be easy to decode in hardware and whether all instructions will be easy to implement in hardware. This is the aspect I will ask the most help from experienced forum members, and I hope to eventually get the answers I need to go ahead with this project.


Sun Aug 04, 2019 8:52 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Joan:

What's the point of a hobby if all you do is copy somebody else's designs? You seem to have taken a deep look at all of the various options, and thought about the constraints each of those options impose on the instruction set, performance, etc. I believe that the path you have taken has informed your decisions so that you have achieved a reasonable balance between the various factors that determine / influence a processor architecture. The fact that you've spent so much time with the compiler and studying the idioms that it uses to translate HLL expressions into the underlying instruction set should yield very good results.

On the matter of the instruction decoding, the actual instruction encoding that you have selected is not necessarily going to be optimal for decoding with 74xx logic. However, there's nothing to prevent you from reforming / redefining the particular instruction formats you've chosen to date to make that task easier when you get to that phase of the project. In other words, since you intend to only use a compiler, there's nothing to prevent you to move the various fields around, or even split them up, in order to make instruction decoding easier.

Some time ago I concluded that DEC's reliance on 3-bit (octal) structures in its instruction formats was due to the limited number of decoder type devices in the 74xx logic family that provided more than 3 inputs. If you find that issue too constraining, then you could always allow yourself to use CPLDs for the instruction decoder and maintain the rest of your design in 74xx logic.

It's your project, just like the M65C02A is mine. I keep plugging away at just for fun. I've been working on it off and on for more years than I can remember, but I still get satisfaction from it.

_________________
Michael A.


Sun Aug 04, 2019 10:10 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 328
Location: Girona-Catalonia
I made an update to the Github repo ( https://github.com/John-Lluch/CPU74 ) with the relevant changes to the new instruction set.

This commit includes changes in the Docs folder, with corrections to the instruction set documents, and a new version of the Asssembler with cleaned up code and a lot more comments in the source code.

The major changes have been on the compiler but I am not posting the source code because to push a compilable version I would have to commit the entire LLVM backend infrastructure, which would be a huge amount of files and code, and probably not that interesting anyway because this can be found elsewhere. I shall copy the most relevant files involving the CPU74 backend work, so interested people can get a good enough idea of what's involved, even if it can't be used directly.

The first tests that I performed with the new compiler and assembler, after solving a bunch of bugs, seem to work as expected, and some code generation improvements are present as well. I shall post some examples after I manage to filter out what's not that interesting to show.

I also added a 'Log File' feature to the assembler that helps enormously to quickly see what's going on without having to step all the time with the debugger.

It took a while to get again to the point I was before I decided the last ISA changes, but I'm happy I can now go ahead with new stuff.

Joan


Last edited by joanlluch on Mon Aug 12, 2019 8:12 pm, edited 1 time in total.



Mon Aug 12, 2019 3:14 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 305 posts ]  Go to page Previous  1 ... 6, 7, 8, 9, 10, 11, 12 ... 21  Next

Who is online

Users browsing this forum: AhrefsBot and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software