View unanswered posts | View active topics It is currently Sun Sep 22, 2019 2:10 pm



Reply to topic  [ 203 posts ]  Go to page 1, 2, 3, 4, 5 ... 14  Next
 74xx based CPU (yet another) 
Author Message
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 167
Location: Girona-Catalonia
As a semi-retired software engineer having worked most of my career on control systems and low level programming, who never went down to hardware design, I guess I always had this as something "pending to do" in my life projects list. One day, watching youtube, I came across the Ben Eater channel, and that made the switch that decided me to learn more on the subject, in order to eventually design from scratch and make a 74xx based computer, which is able to run common algorithms.

I searched on the internet and I found that such a thing had been (obviously) done before by others, so I started looking at what challenges these guys faced while making their own. One build that I particularly got impressed about was the C74-6502 processor, made by 6502.org member Drass, who I have been contacted since through email, and who has helped me enormously to clarify basic concepts and get started with my own project.

The specs that I am currently considering are: modified-Harvard architecture; 16 bit wide data bus and registers; Risc instruction set with as much orthogonally as possible, and able to run at frequencies from zero (step by step) to as much as possible, hopefully 16 MHz or more. One of the features that I want it to have, is a control panel where the processor architecture is fully and clearly depicted with visual indication of instruction fetching, register/bus values, control signals and so on, so that all the internals working can be seen and understood for teaching or demonstration purposes when the processor is run on step to step mode. I do not aim for a particularly compact design, so in my case it is relatively unimportant the number of IC or PCB boards that are used, as long as they do the function. Of course I don't want either to fill a room with IC packed PCBs, but I mean that I will be happy by probably using a 19” rack with plugged euro cards in it, which should give me both modularity and plenty of room for circuitry.

My work so far has consisted in learning about basic CPU architecture, data buses, critical paths, and specs of useful 74xx based ICs. As a non native English speaker, I am also training myself on the vocabulary and wording of the several aspects of CPU internals and working. I have also mostly defined the Instruction Set of the new architecture, more on that latter.

The design of an architecture from scratch (obviously based on existing knowledge) may help to simplify or make some aspects straightforward or more convenient from the design point of view, but it has it's unique challengers too. The most important that I can spot now is that there's Zero software available for the architecture, so it requires at least an assembler and a compiler if I want to run meaningful code on it. For this I am already fighting (yes, literally fighting) with the LLVM compiler tools to create a suitable backend for my proposed architecture.

Joan


Last edited by joanlluch on Sat Mar 23, 2019 12:10 pm, edited 1 time in total.



Sat Mar 23, 2019 11:04 am
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 167
Location: Girona-Catalonia
For a several years period of my career, I developed software for the VAX-11 architecture and VAX/VMS operating system from Digital Corporation. I recall those times as some of the happiest of my life. The VAX-11 instruction set was a real joy to program with. For performance reasons, the original architecture was eventually replaced by the RISC-Alpha architecture, which was a natural evolutive step, but I must tell that got a big disappointment, when the whole company was killed by X86 based PCs and eventually bought by a "Clone-PC" manufacturer like Compact. I almost abandoned my career as a software developer because of the sole idea of having to work with what I regarded as an inferior product. Anyway, the Earth has spun a lot since then and I sort of healed from that "trauma" :-)

Recently, I came across the Texas Instrument MSP430 processor which features a quite elegant, almost orthogonal instruction set. I decided to use it as a basis for my Instruction Set. The assembly language for that processor is compact and elegant because of the various addressing modes that enable instructions to perform ALU operations on memory. Such as "add R12, 2(SP)", which will add the value of R2 to the memory location in the stack pointed to by SP+2. However, although such kind of instructions produce clean an compact assembly code, they still require a number of clock cycles to be performed. In particular, if the processor has the instruction fetching pipelined, I found that it's not generally better to execute such operation in a single instruction than to execute the "Add" and the "Store" in separate instructions, provided we have enough registers.

So now, I am divided about whether to implement such (elegant) MSP430 like instruction set, or go directly for a more down-to-metal pure Load/Store architecture, where all the ALU operations are exclusively performed on Registers.

Joan


Sat Mar 23, 2019 11:46 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1257
A great intro to a very promising project! I like the sound of the machine. I agree that an assembler is very handy - I would add that an emulator is even more handy, for exploring ideas and testing code snippets. As for a compiler, that seems very difficult, but you have software skills, so good luck!

You will find fans of the MSP430 here. I'm not very familiar with it. But my preference is for RISC, for two reasons
- it's simpler, which helps the project progress
- you can start with a multi-cycle unpipelined machine, but with no special obstacles to pipelining later. So that's a road map to an improved performance machine.

As you say, two or even three simple instructions will do the work of one complex one, so there's nothing actually missing from a RISC.

Actually, another advantage is the learning curve for programming: you are going to be the first person learning how to use the machine, so a simple machine makes that easier. Same for a compiler, I would think.


Sat Mar 23, 2019 5:58 pm
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 72
Joan,

Thanks for your introduction to your project.

I also like the orthogonal nature of the MSP430 - and I had a lot of fun programming a small interactive interpreter in under 1000 bytes of MSP430 assembly language.

I was planning on building a TTL computer loosely based on the "Hack" design featured in the "NAND to Tetris" teaching course.

I got distracted and ended up buying a kit - called "Gigatron" from an enthusiast based in the Netherlands.

see www.gigatron.io

The kit provides colour graphics and sound, an offline assembler and emulator, and runs Tiny BASIC - and it achieves this with fewer than 40 TTL chips presented on a professional pcb in an attractive case.

It has the performance typical of an early 1980's home computer, with a 16-bit Von Neuman virtual machine which runs on top of the 8-bit Harvard architecture. For speed comparison, it can perform a 16-bit addition in 4.5uS, whilst a 1MHz 6502 would take 20uS to perform the same instruction.

There is the opportunity to overclock it above it's current 6.25MHz clock - and I am looking at around 16MHz as the upper limit of the design.

If you are considering embarking on a TTL computer project - there are some good lessons to be learned from the Gigatron design.

1 The ALU and the control unit are great examples of what can be done with a few simple TTL chips.
2 Don't try to use obsolete devices like the 74xx181 ALU or 74xx172 register files
3 Trying to base it on an existing mcu instruction set will probably result in a lot more ICs

As much as I like the MSP430 - I think the effort of building a true 16-bit machine with a bank of 16, 16-bit registers and retain all the MSP430 orthogonality would prove too much for a TTL design.

regards



Ken


Sat Mar 23, 2019 7:06 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 167
Location: Girona-Catalonia
Thank you for your replies,

BigEd, yes, I think it's wiser to go for a pure load/store architecture than trying to emulate a more complex instruction set. The emulator is something that I had in mind, in fact, and I agree it would be useful to test programs before loading them into the real thing so software issues (bugs in the program) can be easily isolated from hardware glitches.

About compilers, I have implemented a few simple ones for proprietary languages in fact, mostly for the knitting industry when I was relatively young. These were fully functional languages including branches and subroutines, but the output code was not fully optimised. More recently I was the main developer of the HMI Draw application on the App store, that features a JIT compiler to RPN notation for expressions entered by users which become part of an expression execution tree of any complexity (similar to what spreadsheets do).

Just as a mater of curiosity, my first attempt at generating compiled code from a text source, was on the VAX-11. I would convert a maths expression expressed as a string with operators, math functions, parenthesis and so on, into real VAX-11 machine code stored in a byte array. Then I would jump program execution to the beginning of that array by means of an inline assembly instruction to actually execute the code in the array as a subroutine. YES, that worked, and that was possible on the VAX-11 architecture !!. The only problem was that if the code in the array was buggy or never returned to the caller, the calling program would hang or crash, or would cause the user process to crash. But that was all. This was easily solved by just logging in again, as the system would never crash unless you attempted such dirty things on supervisory mode. Being able to play with the machine that way was truly amazing, although I guess security was not a major priority on those times...

Rather than attempting to the enter into the impossible task of coding a complier from scratch (specially a good one), I am now going to the theoretically easier route of creating a custom backend for the existing LLVM compiler. Unfortunately, the available documentation is very scarce and it only covers things at a very superficial way. I even purchased the recommended books on the subject, but not even the books are able to get into any useful level of detail. Which is a pain because I find myself guessing, and performing trial and error which is never a good approach to anything. So far, I got the instruction set defined and the compiler backend generating correct sequential code as long as there's no branches, type changes, or subroutine calls. But that's the easiest part. I just hope I don't get stuck into something that prevents me to go ahead.

Regarding RISC or CISC, I would say that it's not necessarily true that compiling for a RISC architecture is easier than for CISC one. It all depends on the actual flexibility of the instruction set. For example the VAX-11 was the most friendly compiler architecture ever, and it was pure CISC. But it's true that most RISC architectures are designed with a special consideration to compilers.

I attach the instruction set, with machine codes, that I am considering so far:

Monsonite, Thanks for your input.

It's interesting that a full computer able to execute retro games can be made with such a small number of chips and machine instructions. The concept is really amazing, and to some extend makes my project to look totally overkill and unnecessary. But I'm just doing it for fun, so why not?

Still, I will definitely have a close look at the gigatron architecture including the chips the author used and the implementation of the several processor units. This is all very interesting.

On the other hand, I wonder if the author of the gigatron, spent a lot more time on Software development than Hardware. It's interesting because it looks that he chose a very simple hardware architecture and then adapted a lot of software for it.

I would like to know how he did that?. Is the virtual 16 bit machine that you mention a crucial aspect of it? I mean, is he running a software implementation of an existing processor (for example a 6502) on top of his simple hardware architecture, so that he is just running existing 6502 software and games?. If so, that would be a very clever idea!

Otherwise, do you know if there's actual C code for the software that he is running, for example the Basic interpreter, and the retro games, so that it's relatively easy to port for any architecture provided there's a suitable compiler? That's something that I'm very interested about.

Thanks

Joan


Attachments:
File comment: My CPU 74 - Instruction Set
CPU74InstrSet.pdf [56.54 KiB]
Downloaded 168 times
Sun Mar 24, 2019 12:12 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1257
(Might be worth noting there's an existing thread on the Gigatron:
Gigatron - A TTL machine with integrated VGA
)


Sun Mar 24, 2019 9:01 am
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 72
Hi Joan,

I have looked at your proposed instruction set - and I can see the clear influences of MSP430 and instruction set orthogonality.

For a couple of years I have been following the subject of Minimal Instruction Set Computers (MISC) - and that is part of the reason how I came to join this forum.

Charles Moore (Forth) spent many years in the 1980s and 90s working on processors that executed a minimal instruction set. Unfortunately very few of his processors made commercial success - so without having access to an actual MISC cpu - I turned to the MSP430 as the closest thing to what I wanted - and wondered if there were ways in which some of its instructions could be eliminated, to reduce processor complexity - but without having a drastic effect on its performance, or ease of programming. This study is still ongoing - but it did receive some stimulus and ideas from the One Page Computing thread on this forum. Here various architectures and instruction sets are explored methodically to see which perform best within certain restrictions.

To gain more performance than the MSP430 I am now looking at using a 400MHz ARM running a simple simulator written in C, to explore various architectures and instructions. The ARM can emulate virtually any reasonable instruction set - and do it fast enough to produce a meaningful performance.

Whilst this may appear an unusual choice, I come from a hardware background, and have very little knowledge of the modern software development environment. So I have chosen the fastest STM32H743 microcontroller that has a low cost development board ($25) - that I can write bare-metal C code for, without being restricted by gigabytes of toolchains and operating systems I do not understand.

I have replied to your Gigatron comments on the separate Gigatron thread.


regards


Ken


Sun Mar 24, 2019 12:24 pm
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 72
Joan - your two statements got me thinking about virtual machines.

Quote:
Just as a mater of curiosity, my first attempt at generating compiled code from a text source, was on the VAX-11. I would convert a maths expression expressed as a string with operators, math functions, parenthesis and so on, into real VAX-11 machine code stored in a byte array. Then I would jump program execution to the beginning of that array by means of an inline assembly instruction to actually execute the code in the array as a subroutine. YES, that worked, and that was possible on the VAX-11 architecture !!.


Quote:
Otherwise, do you know if there's actual C code for the software that he is running, for example the Basic interpreter, and the retro games, so that it's relatively easy to port for any architecture provided there's a suitable compiler? That's something that I'm very interested about.



The Gigatron virtual machine has just 37 instructions (see table below) - and the opcode of these instructions is just equal to the jump address into page 3 of the ROM where the native assembly routine starts that executes the virtual instruction.

You can see all of these instructions listed in the Gigatron ROM disassembly listing. They are mainly 16-bit arithmetic and logic instructions, conditional branches and load and store operations.

This is very similar to what you describe that you did on the VAX-11, and also what I did in my tiny interpreter that runs on the MSP430.

If your TTL computer were to implement the same virtual machine instructions - from whatever instructions that your architecture has available, then programs such as the Tiny BASIC or games would run on both machines.

As a historical side-note:

When Tiny BASIC was first written around 1976 - it used a virtual machine model which executed a language called IL "Interpretive Language".

The idea was that the VM could readily be implemented in assembly language on whichever CPU - such as 6502, 6800, 1802, 8080 - and the Tiny BASIC was contained in about 350 of the VM instructions as IL bytecodes. The combination of the VM and the bytecode implementation allowed Tiny BASIC to fit into about 3K bytes of memory.

There's a good article about it here - with other links to Tom Pittmans design notes.

http://troypress.com/the-tiny-basic-int ... nd-onions/

For interest, here are the Gigatron VM instructions - listed by frequency of usage in two programs "WozMon" and "Tiny BASIC"

As you can see - some instructions are heavily used, some very rarely. This might give a useful insight into what instructions take priority when designing an ISA.

Attachment:
Capture1.JPG
Capture1.JPG [ 43.38 KiB | Viewed 5708 times ]



regards


Ken


Last edited by monsonite on Sun Mar 24, 2019 3:28 pm, edited 3 times in total.



Sun Mar 24, 2019 2:19 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1257
Nice analysis, and an interesting idea to standardise on the opcodes and encodings of the gigatron, at least as a supported subset.

Might be worth noting that the static frequency of opcodes tells you something about code density, and perhaps about desirability, but the dynamic frequency tells you something about the performance. So, for example, a backward branch is probably high for both types of statistic and is well worth having. But a subroutine call/return might appear relatively often in a static analysis, but not be too problematic if replaced by a couple of simpler instructions, from a performance perspective. A CMP might be used quite a lot, but perhaps be replaced by a SUB without too much loss, if there was no room for it.


Sun Mar 24, 2019 2:36 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
Joan:

I'll be looking forward to reading about your processor. I have certainly enjoyed reading Drass' posts over on 6502.org and Rob Finch's posts here on anycpu.org, and I look forward to reading your posts here.

I have defined a number of processors over the years, and I think a static analysis of the usage frequency of the instructions that Monsonite provided above is a good way to focus your resources. My own static instruction frequency analysis for my M65C02A, which I used to determine what additional instructions / addressing modes to provide in order to support stack frame based languages like C and Pascal, was very helpful. In your case, you do need to be overly concerned about backward compatibility with a particular instruction set architecture like I was so you can focus on the performance of the most common instructions.

I also think that BigEd makes a good point regarding the dynamic usage frequency of your processor's instructions. It appears that you're also planning to build a simulator, so including a profiler function in that tool may also help determine the dynamic instruction frequency that I think BigEd is referring to above.

In my case, I've recently been trying to port a fig-Forth 1.0 implementation to my M65C02A processor model. Even though the processor is fully backward compatible with the 6502/65C02 (except for some specific behaviors of the JSR/BRK/RTS/RTI instructions, and not being cycle accurate, i.e. all dead cycles have been removed), I opted to enable most of the extensions through prefix instructions which can reasonably be expected to increase the code size. However, the reality is that the code size of the resulting fig-Forth M65C02A-specific implementation is pleasantly smaller than the 6502/65C02-specific implementation by several hundred bytes. And this is the case even though I am still using the 6502/65C02-specific implementations of the multiply and divide routines.

As I was working through the porting issues, one thing really stood out to me. The number of instruction cycles needed to implement the Forth VM in 6502/65C02 machine code is substantially more than with the M65C02A because of the Forth VM that I included within the architecture itself. Instead of requiring tens of cycles to implement the ITC NEXT, ENTER (DOCOLON), EXIT (;S), my processor's Forth VM instructions allow these time critical operations to be implemented in 6, 8, 10 cycles respectively.

I have a fairly complete implementation of the ISA for an FPGA. However, I've had some serious issues with the verification of that implementation. Testing within the FPGA simulation environment complex machines like the M65C02A is very time consuming. So I've been using the py65 environment to develop and fine tune the M65C02A architecture. The process of getting the fig-Forth VM ported to the architecture helped me find a conceptual error. I can correct that error in the processor model much easier, and once everything is working, I can apply the changes to the FPGA model. Although working with the M65C02A processor model in the py65 environment does not have the same sense of completeness as in the FPGA, it is a far friendlier environment to work with. As a matter of fact, I modified the processor model over the past few days to include a tracing function for Forth programs. That tracing function helped me validate most of the Forth words between the 8-bit and the 16-bit implementations, and identify the specific word in the 16-bit implementation that was preventing me from getting both implementations to work correctly.

Good luck with your project, and I look forward to your posts.

BTW: py65 can be coerced to support any processor architecture. Given your apparent SW background, it may be a good framework to use to get a model for your processor going quickly.

_________________
Michael A.


Sun Mar 24, 2019 6:48 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 920
Location: Canada
Hi, it sounds like you’ve got an interesting project happening. I always like reading about the TTL stuff and would like to read more about it.

I like to work on more complex projects and gave up on the notion of using discrete IC’s for most of the logic as being too impractical and requiring too many resources that I don’t have. I’m a fan of FPGA’s where one can fit the equivalent of hundreds or thousands of IC’s onto a single chip and it’s just a small board. I've got a largish project (for a hobbyist) going with the FT64 system.
Some of my favorite projects to follow along are the TTL 6502, or relay based computers filling a whole room.

_________________
Robert Finch http://www.finitron.ca


Mon Mar 25, 2019 2:54 am
Profile WWW
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 167
Location: Girona-Catalonia
Hi Ken,

Thanks for the update on the Gigatron. The subject you bring up about instruction frequency is interesting. Actually, this is what (at the time) made the justification to switch from CISC to RISC. However, I would say that once you are on a RISC machine, then such studies can become biased in favour of a particular subset of available instructions, while missing opportunities for new instructions that might not be available. Also compiler technology has a huge influence on this.

For example, most current RISC architectures are pure Load/Store systems, and thus load/store such instructions are used almost as often as ALU operations. However, on earlier pure CISC systems (I always think on the VAX-11, when talking about "purity") the use frequency of instructions for simple load/store between memory and registers was much reduced on compiler generated assemblies. This is because memory was more often accessed directly from instructions thanks to the large number of available addressing modes and Instruction Set orthogonality.

So my take on this has been observing the kind of instructions that modern processors implement, and choosing a complete set based on what I expect the compiler to generate. I originally aimed at more orthogonality and ALU operations on memory allowed (like the MSP 430), but then I figured out that from the point of view of performance, this would be not different than a load/store pure implementation. After removing all the memory+ALU instructions I recovered a lot of encoding slots in the instruction set that I'm now using for instructions that have Data embedded in the encoding, such as Constants for ALU operations and Relative Addresses for branches. These types of instructions are also very common in modern RISC architectures.

Joan,


Mon Mar 25, 2019 3:06 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 167
Location: Girona-Catalonia
Hi BigEd,

You are right about making the distinction between "code density" and "performance".

The MSP 430 is marketed to achieve a very good code density, while the AVR architecture looks to me almost the opposite. On the other hand, it appears to the that the ARM-thumb beats them all with a slightly improvement over the MSP 430. This is in part because many ALU instructions operate on three registers. However this comes with significant loss of orthogonality which makes things harder for compilers. Anyway, that's just my own subjective perception, not that I read any study on the subject or performed one myself. I compared these three architectures because they all have mostly constant width 16 bit instruction opcodes.

I suspect that with my proposed instruction set I will be somewhat worse than than the ARM and the MSP 430 on this respect, but noticeably better than the AVR. This is a price to pay if I want to keep easy to decode instructions adhering to defined bit templates.

I will be able to estimate relative code density in a more precise way as I progress with the compiler backend. This is because it is easy to just compile the same code for different architectures in order to compare outputs.

Joan


Mon Mar 25, 2019 3:26 pm
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 72
Hi Joan,

If you have not read the "One Page Computing" thread - there is a convenient link here https://revaldinho.github.io/opc/

This discusses the instruction sets of 8 different processor designs of different architectures.

It starts with a simple accumulator design, then load store, then a MISC - each with 16-bit address and data bus. later designs are 24-bit and 32-bit.

There is plenty of rich information here.


Ken


Mon Mar 25, 2019 3:36 pm
Profile
User avatar

Joined: Fri Mar 22, 2019 8:03 am
Posts: 167
Location: Girona-Catalonia
Hi Michael,

As I replied to BigEd, although instruction frequency study is interesting and a way to get ultimate performance, I have that much to do before I get some real thing, that I guess I am just choosing my instruction set based on previous experience and what's easier to compile and then decode.

Of course, pseudo-coded based languages like Forth can improve density enormously, in the same way that load/store do just the opposite. You are doing a great job, but I guess that I'm totally new on this, I don't really know anything about FPGAs, and to me just being able to design a working processor using raw IC circuitry is already a major milestone, and the only thing that I want to do. I don't really want to go beyond Logisim to test hardware concepts up to the extent than the program might allow.

About py65, I didn't know about that either. But I think that I will probably feel more comfortable by writing a small C or C++ program to simulate the processor. I'm not particularly keen of the Python programming language, and I feel that I may have a lot more control if I produce my own code to test the processor in a language I am comfortable with. It should be not that difficult because after all it's just a single line parser for the assembly file and an execution table.

Joan


Mon Mar 25, 2019 3:56 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 203 posts ]  Go to page 1, 2, 3, 4, 5 ... 14  Next

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software