Last visit was: Mon Dec 09, 2024 6:55 am
|
It is currently Mon Dec 09, 2024 6:55 am
|
Suite-16 (formerly Bitslice using currently available TTL)
Author |
Message |
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
Hi Joan,
I learnt a lot from Marcel's Gigatron project. This was a good introduction to TTL computers. Marcel taught me the value of using the Arduino as a development tool - but also as a programmable interface device to handle things like PS/2 and software loading.
I'm not currently planning any integrated VGA system as there is in the Gigatron. I would struggle to get my design to achieve much better than the 1/4 VGA resolution of the Gigatron - so this is not a high priority.
I'm thinking of using a Gameduino as the external graphics card. With a bit of tweaking the Gameduino can produce a 1024 x 768 full colour display - and not consume all of the TTL processor resources.
Meanwhile I have taken a bit of change in direction in the project and I am making good progress. The documentation process on Hackaday has helped me focus my ideas.
At first I felt that I had to get the hardware in the bit-slice designed, and pottered along with that for a while. I took it about as far as I could with "Digital" - then realised that lack of instruction set was preventing any further progress.
I realised that my instruction set ideas were too vague, and so I changed course and started to firm up the instruction set.
With the basic Instruction set in place I could then put a very simple simulator together. This has been my focus for the last few days.
I could then get a feel from writing some code to run on the instruction simulator whether my choice of instructions is suitable.
So far I am making good progress and just this morning I have proven that my CALL and RETURN mechanism is working correctly.
I can then go back to my printnum routine and rewrite it with a couple of subroutines which will make the code a lot more elegant.
I'm enjoying the challenge of coding on an unfamiliar simulated cpu, and I hope to have my "1000 byte" SIMPL interactive language running by the end of the weekend. When I get that working - and see whether it fits into 1000 bytes of codespace (like it does on the MSP430) - I will probably have written enough code to have confidence in my instruction set.
Then it will be time to go back to thinking a bout the hardware - because I will have to make sure that I can actually implement all of the instructions in real hardware.
I think this is probably the normal development cycle for a TTL cpu - you work for a while on one aspect of it - say simulation - to make some progress, and then you have to go back to the hardware and bring that along a bit.
I'm sure it is an iterative process - and you can't get everything working immediately.
Sometime in the near future, I am going to transcribe my simulator into verilog - and explore the design running on an FPGA. This has been one of the long term aims of the project - but I now have the confidence to tackle it.
regards
Ken
|
Wed Oct 23, 2019 11:24 am |
|
|
yeti
Joined: Fri Oct 20, 2017 7:54 pm Posts: 8
|
monsonite wrote: Sometime in the near future, I am going to transcribe my simulator into verilog - and explore the design running on an FPGA. Why not start with Verilog-74xyzzy TTLs? E.g.: https://github.com/TimRudy/ice-chips-verilogMaybe on this platform (verilog, verilator, iverilog) you'll get more co-workers faster?
_________________ "Stay OmmmMMMmmmPtimistic!" — yeti "Logic, my dear Zoe, merely enables one to be wrong with authority." — The 2nd Doctor "Don't we all wait for SOMETHING-ELSE-1.0?" — yeti
|
Wed Oct 23, 2019 11:38 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1807
|
monsonite wrote: ... just this morning I have proven that my CALL and RETURN mechanism is working correctly.
Great! Quote: I'm enjoying the challenge of coding on an unfamiliar simulated cpu, and I hope to have my "1000 byte" SIMPL interactive language running by the end of the weekend. ... I think this is probably the normal development cycle for a TTL cpu - you work for a while on one aspect of it - say simulation - to make some progress, and then you have to go back to the hardware and bring that along a bit.
It's good to hear about your progress, and to see a machine come to life piece by piece.
|
Wed Oct 23, 2019 2:21 pm |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
I'm starting to make some good progress with Suite-16 coding, and enjoying the challenge of every new routine. Since I got the CALL and Return instructions working yesterday I have been able to rewrite my decimal number print routine using a much cleaner subroutine approach. I wasn't happy about my first attempt at printnum - it had a large block of code basically repeated 4 times. I have coded this functional block into a subroutine called decimate, which is called 4 times with a different decimation factor (either 10,000, 1000, 100, or 10). This simplified the code, makes it a lot easier to read and shortens the routing from 84 words to just 39. My latest log is here https://hackaday.io/project/168025-suit ... -printnum2I have put the latest code (simulator, decimal number entry and printing routes) on my Github https://github.com/monsonite/Suite-16There are versions with and without register debug. Incidentally, my simplified ascii to 16-bit integer routine is 32 words long. When I last coded the same function a couple of years ago in MSP430 assembler the code length was 25 words. (In each case the instruction word is 16 bits). I'm happy that despite the limitations of my cpu addressing modes and lack of inter register instructions and byte instructions, that I'm achieving a very similar code density.
|
Thu Oct 24, 2019 4:28 pm |
|
|
joanlluch
Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia
|
I wonder if you have considered to implement the printnum function as a loop iterating the power-of-ten factors. The values (10000, 1000, 100 ...) can be arranged in memory and accessed by index. The 'decimate' subroutine can be avoided as well because the code can be simply put in the loop body. I believe this should result in shorter code, even if we account for the constant factors stored in memory, but I'm more used to think in terms or C language than in raw assembler so maybe I am wrong. I implemented the function as described in C and compiled for both the MSP430 and my CPU74. Excluding the 8 memory bytes (4 words) used to store the power-of-ten values, the function takes 16 instructions and 20 words for the MSP430, and 15 instructions and 17 words for my CPU74 architecture. Considering that this is compiler generated code, I am sure this can be reduced even further.
|
Thu Oct 24, 2019 9:22 pm |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
Joan,
Interesting thought, but I did do some fairly aggressive optimisation. Because I have plenty of registers, once I have loaded the various constants into them, there is very little need to involve memory.
I counted the MSP430 instructions as 16, but this translates to 25 words, as not all instructions are single word - especially where you need to load a register with a 16-bit constant.
I must say, that having coded two or three principal rountines in asm, that you quickly gain experience and it also starts to show up deficiencies in the instruction set.
I think I will introduce an 8-bit ADD and SUB to the accumulator - so that a small constant can be placed into the payload area, and used to modify the accumulator without having to use another register or the next word in memory as the data source. This would be useful for testing ascii characters, or converting digits to their ascii values etc.
I also need to get my relative branching working - so that I can write relocatable code. At the moment I only have absolute branching which is a bit of a limitation.
I am sure that there are many ways in which I can improve the instructions set, making more use of the 8-bit payload byte - but the question is will I be able to implement it all in TTL if it becomes too complex?
|
Fri Oct 25, 2019 8:44 am |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
Today I have been focusing on the overall structure for my SIMPL interpreter, and got most of the plumbing in place so that it will actually do useful stuff.
SIMPL is Serial Interpreted Minimal Programming Language - it's a tiny reverse polish, extensible forth-like language that uses single ascii characters as commands and variables. Think of it as the Tiny-BASIC of the Forth world.
I like it because its very compact and can be coded in fewer that 1k bytes of memory.
In short - an incoming character, read from the text input buffer, indexes a look-up table that contains the start addresses of the primitive functions.
I created a new instruction to directly access a look-up table using the accumulator contents as an index, and then jump to the address stored in the table.
For example, if the interpreter encounters "p" in the input buffer, it will use the ascii code for p (0x70) to index into an array, and pick up the 16-bit address of the printnum routine.
The interpreter will then jump to printnum and execute the code there printing the value on the top of the stack as a 5 digit decimal number, before jumping back to NEXT, which fetches the next character from the input buffer.
SIMPL is an implementation of a 16-bit virtual stack machine - so most arithmetic and logical operations are done on the top two items on the stack.
I now have the means to perform ADD, SUB, AND, OR, XOR on the top two stack entries (and INVert, INC and DEC on the top item) just by entering the relevant ascii symbol into the text input buffer.
Other stack operations such as DUP, DROP, SWAP and OVER will follow in due course.
I have decimal input and output routines and the means to print strings and memory DUMP to the terminal.
So far I'm up to about 190 (16-bit) words of codespace - so plenty of time and room to get things running this weekend.
|
Fri Oct 25, 2019 11:01 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1807
|
> I created a new instruction... Luxury! I see why you might feel moved to do that, but how bad was the alternative? Hopefully not even half a dozen instructions.
|
Sat Oct 26, 2019 7:29 am |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
Hi Ed, It did actually involve quite a few instructions, (half a dozen actually), because of current limitations in the instruction set. Suite-16 is currently only using an 8-bit jump address which is stored in the payload section of the instruction. If we extend this to a 16-bit jump, the target address will be held in the word following the jump instruction - where we can manipulate it. We can use the accumulator to overwrite this target address, so we can effectively jump to an address that is held in the accumulator. This currently will have to be done in a two stage process, sometimes called a Trampoline Jump. Let's assume that the accumulator holds 0x70 the letter p, and we want to jump to address 0x0100 (the start of the printnum routine) which held in the lookup table. We use the indirect register addressing mode to index into the table, using register R1 as a pointer. Our trampoline will be placed at locations 0x80 and 0x81 Code: ST R0, R1 // R1 now contains 0x70 LD R0, @R1 // R0 contains 0x0100 SET R1, 0x81 // The trampoline's target address location ST R0, @R1 // store 0x0100 at location 0x81 JMP 0x80 // Jump to 0x80 where the trampoline jump instruction is located
0x0080: JMP // 16-bit jump 0x0081: 0x0100 // Placed there from accumulator via R1
This method is quite clunky and it takes 6 instructions to direct the program flow to the printnum routine. Not the sort of overhead that I really want in the inner interpreter. So I added a new instruction JMP @R0 Code: ST R0, R1 // R1 now contains 0x70 LD R0, @R1 // R0 contains 0x0100 JMP @R0 // Program jumps to address 0x0100
|
Sat Oct 26, 2019 7:53 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1807
|
Thanks for elaborating! (My understanding is that before index registers were invented, modifying the code ahead was very normal. A lack of indexed jump seems like a similar case. It will probably always feel near-unmaintainable to modern sensibilities, but there it is. An 8-bit jump address might be a bit of a constraint though, depending on how many routines you need to reach and how large they are.You can I suppose JMP to a JMP and bunny hop your way to clear space.)
|
Sat Oct 26, 2019 8:07 am |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
Ed,
I did have some reservations about entering the dark lair of "self modifying code".
When I looked at EDSAC simulation a couple of years back, it was the first lesson you learned - in order to get anything useful done.
I now need to look at the new instructions and just make sure I can actually implement them in TTL.
Modifying a line of code in the simulator to gain a new feature could quite easily end up with a lot more TTL chips, more datapaths and more control signals.
|
Sat Oct 26, 2019 8:24 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1807
|
Very good point, to keep an eye on the implementation. (It's not the same thing, exactly, but hoglet noticed that the T80 core has begun to get larger, as the maintainers add the various bits of undocumented behaviour. What's not happening is them getting the right side-effects from their implementation, instead they find themselves adding mechanisms to act more and more like the Z80.)
|
Sat Oct 26, 2019 9:45 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1807
|
(Oh, of course, if you happen to have a return-from-subroutine kind of behaviour, again something missing in early computers, you can construct a suitable address and then return to it. Or indeed, if your PC is an addressable register, that makes it very easy to perform a computed jump.)
|
Sat Oct 26, 2019 9:46 am |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
Previously I had been simulating the Suite-16 cpu on an MSP430 Launchpad board with FRAM.
I noted that despite it being a 16-bit processor, the performance was not so good, so I have swapped over to a Nucleo STM32H743 board which has a 400MHz ARM processor.
I'm still using the Arduino IDE to develop code - because it has a useful timing function micros() which returns the number of microseconds since the program was started. With this I can get fairly accurate timing information from my simulator.
I have used one of the spare opcodes to allow the instruction count and the elapsed time to be output to the terminal
I have set up a simple loop that loads R0 with 32767 and repeatedly decrements it until it reaches zero. I then print out instruction count and elapsed number of microseconds.
Based on the "count down from 32767" loop, my Suite16 simulator is running about 8 million simulated instructions per second.
That's about 66% of what I'm hoping the TTL cpu to run at.
Based on the 400MHz clock on the Nucleo board, I can estimate that the simulator in C is taking about 50 ARM instructions to execute a Suite-16 simulated one.
I tried exactly the same code on the MSP430 which is a nominal 16MHz. Unfortunately the FRAM only works at 8MHz, so that slows it down considerably to about 75,000 simulated instructions per second.
So I tried a 16MHz Arduino with an 8-bit AVR ATmega328 and the results were much improved to nearly 139,000 instructions per second.
This suggests that the AVR is approximately 59 times slower than the ARM.
|
Mon Oct 28, 2019 1:26 pm |
|
|
monsonite
Joined: Mon Aug 14, 2017 8:23 am Posts: 157
|
This week there has not been so much obvious progress with Suite-16 - but I have been tying up a few loose ends. It became obvious from my early coding ventures that the instruction set was lacking the means to handle 8-bit values, without going through some rather clumsy register operations. To remedy this I have added a couple of 8-bit immediate instructions ADI and SBI where an 8-bit immediate contained in the payload area (lower 8-bits of instruction register) can be added or subtracted to/from the accumulator R0. This has proven really useful when handling strings of 8-bit ascii characters, and in the hex to decimal conversion routine. The other bit of news is that Frank Eggink (twitter @frankeggink) has created a table of instructions allowing the TASM32 table driven assembler to be used with Suite-16. https://github.com/frankeggink/Suite-16 ... t.asm?ts=4Frank has already been able to code up some meaningful assembly language programs using TASM32 - and he has spotted a few bugs or deficiencies in my instruction set - which together we are setting about fixing. Today I have finally got around to writing a hexadecimal entry routine which allows hexadecimal (0 to FFFF) to be entered from the serial terminal and then displays them as a decimal number. Using Chuck Moore's Forth philosophy of starting with a prototype routine to get something working, and then iteratively refining the routine, making it simpler and shorter, I got the hexadecimal routine down from 52 to 34 words. This was principally down to the fact that part of the routine could be factored out as a subroutine, considerably shortening the code. Tomorrow I hope to turn this into a simple hex-loader, so that the object file from the assembler can be loaded into the simulated processor using a serial terminal emulator - such as TeraTerm. It's been small steps this week, but I'm slowly starting to create a useful set of utilities.
|
Sat Nov 02, 2019 10:52 pm |
|
Who is online |
Users browsing this forum: CCBot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|