AnyCPU - View topic - 74xx based CPU (yet another)

Page 19 of 21

[ 305 posts ]

Go to page Previous 1 ... 16, 17, 18, 19, 20, 21 Next

74xx based CPU (yet another)

Author	Message
joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) oldben wrote: 16 bit access could be done as macro for the few cases you need to fake a structure from a byte array like a disk directory structure. read16(x) ((char) x+(char)(x+1)<<8) Ben. Hi Ben, that's a good idea. Actually it occurs to me that I can add 16 bit load/store 'pseudo' instructions to the compiler, so that the compiler can do its thing as if these instructions actually existed, then transform them to byte load/stores and shifts/ors during machine instruction selection. I think this should benefit from target independent compiler optimisations, and would still be transparent for users. At least for 16 bit array elements it can be done like this. On the other cases, namely scalar variables and structs, I think it's still better to just align everything to 32 bits and perform all non-byte load/stores with 32 bit load/store instructions.
Mon Nov 09, 2020 3:45 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) I have now been able to run the "8 queens" code example in the simulator, with the improved instruction semantics. The example and previous tests were discussed around this thread on the forums: http://anycpu.org/forum/viewtopic.php?f=8&t=447&p=5409&hilit=queens#p5409 As a remainder, this is the results that I posted at the time, with the total number of executed instructions and cycles required to find all the possible solutions to the queens problem. Code: Executed instruction count: 1321973 Total cycle count: 1762834 Elapsed time: 2.46086 seconds Now, I got the following result: Code: Executed instruction count: 1296252 Total cycle count: 1733090 Elapsed simulation time: 2.70064 seconds Calculated execution time at 1MHz : 1.73309 seconds Calculated execution time at 8MHz : 0.216636 seconds Calculated execution time at 16MHz: 0.108318 seconds So it looks that the simulator is slightly slower, because the split instruction bitfields are harder to decode in software, but the actual CPU74 code is slightly faster as it takes less instructions and number of cycles. Taking into account the number of cycles, it is faster by 1.7 %. . I would have expected a bit better, but that's more than nothing. Code size is also reduced
Mon Nov 09, 2020 4:07 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) Went back to the Logisim simulator model, and updated it with the new instruction semantics described on my previous posts. This essentially involved inserting an "SHL1" circuit in the BUS_B path to the ALU, and adding a control signal to activate it. As commented earlier, this does not add any meaningful delay because the data path is only affected by the 0.25 ns max propagation delay of the 74CBT3257 switches. All the updated Logisim drawings are available from here https://github.com/John-Lluch/CPU74/tree/master/Docs/LogisimDocsV12 As the critical path still remains on the "Fetch" stage, I also replaced the "RegPC" circuit to make it faster. The problem with this module is that program memory must be accessed not only with the PC as the address register, but also independently. The isa provides the "load from program memory" instruction to do so. However, when data is read from program memory, the PC must keep its value because of course program execution must eventually continue from where it was. The old circuit based on 74AC161 incrementers was capable of that, https://github.com/John-Lluch/CPU74/blob/master/Docs/LogisimDocsV10/RegPC.png and it worked in the simulator, but it is not as fast as it can be because it does it by selecting either "PMAR" or "PCMem" with a 74AC74 flip flop which is clocked at the beginning of the cycle. This adds the delay of the 74AC74 to the delay of the 74CBT3345 or the 74AC574, which added together with the 45 ns of the memory, it accounts for a critical path of exactly 16 MHz. It's just enough for my goal, but since I was able to improve the Decode-Execute path to a better figure, I wanted to attempt that too for the Fetch stage. So the new circuit uses an explicit incrementer (made as a carry skip adder around 74ac283 adders), with a 74AC273 register for the PC. Now, the PMAR is always connected to the memory address inputs and updated every single cycle. Normally, both the PC and the PMAR are updated simultaneously at the clock edge, with either 'PC+1' or the INPUT. This is fast because program memory receives the new address as soon as the PMAR is updated. In the case of memory read cycles, the PC is simply not given any clock pulse. The circuit needs more components but it is faster and the control circuitry is simpler also. This is the direct link https://github.com/John-Lluch/CPU74/blob/master/Docs/LogisimDocsV12/RegPC.png Dieter (ttlworks on the 6502 forum) has generously drawn a block diagram showing both the old and new circuits (new circuit is on top), which is much easier to understand than my crude logisim model files (thanks for that, Dieter!): Attachment: RegPC1.png Now the critical paths of the Fetch and the Decode-Execute stages are very balanced with a top clock frequency well above my 16 MHz goal. as shown in the updated Timing Chart diagram https://github.com/John-Lluch/CPU74/blob/master/Docs/TimingChartV12.png Joan [In the following days/weeks, I will work on making the Logisim simulation actually running code. Provided there's no major bugs on the model, this will involve creating the PLA arrays for the instruction decoder, and a lot of testing] You do not have the required permissions to view the files attached to this post.
Fri Nov 13, 2020 6:37 pm

oldben Joined: Mon Oct 07, 2019 2:41 am Posts: 810	Re: 74xx based CPU (yet another) I suspect the fastest version would not have PC, but next instruction field in the opcode, like some of the very early machines. {OP}{DATA}{NEXT}. Ben.
Fri Nov 13, 2020 9:53 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) oldben wrote: I suspect the fastest version would not have PC, but next instruction field in the opcode, like some of the very early machines. {OP}{DATA}{NEXT}. Ben. Well, I see a problem with that approach which is the instruction encoding length. On a 16 bit address machine this implies that every instruction requires 16 additional bits just to store the next instruction address. Given that most instructions are executed in memory sequence, I don't really see the advantage of that. This also complicates conditional branches I suspect...
Tue Nov 24, 2020 10:55 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) At some point I realised that I would needed a functional test suite for the CPU74 architecture. This is something I've been delaying, but I finally put some work on it. With the logisim model almost ready for testing, the test suite will come handy to debug any issues. I have it now half-finished and posted here https://github.com/John-Lluch/CPU74/tree/master/Test-Suite. I got my inspiration in part from the Klaus Dormann 6502 suite, but I am writing it as a 'c' source file instead. The tests are however essentially written in assembly, so there's essentially a lot of 'asm' statements embedded in the in the 'c' code main structure (the .s file is just the output of the compiler).
Tue Nov 24, 2020 11:09 pm

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1835	Re: 74xx based CPU (yet another) Good move! Things always take a leap forward when you have a test suite - and once you've got it, it's not too hard to extend it.
Wed Nov 25, 2020 9:01 am

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) I completed the CPU74 Functional Test Suite, and successfully ran it both in the software simulator and the logisim logic model. For reference this is the test suite output with the 'log' option enabled Code: /Users/joan/Documents-Local/Relay/CPU74/Simulator/DerivedData/Simulator/Build/Products/Release/c74-sim preMovCmp ...pass preTestCall ...pass preShortBranch ...pass branchAddress ...pass callAddress ...pass branchCondtion ...pass branchCondtion32 ...pass prefixEdge ...pass byteShiftsAndExtensions ...pass bitShifts ...pass stackFrame ...pass loadStoreOffset ...pass loadStoreIndex ...pass loadStoreAddress ...pass selectAndSet ...pass addSubNegTest ...pass addSubTest32 ...pass andOrXorNotTest ...pass Executed instruction count: 16349 Total cycle count: 20783 Elapsed simulation time: 0.117987 seconds Calculated execution time at 1MHz : 0.020783 seconds Calculated execution time at 8MHz : 0.00259787 seconds Calculated execution time at 16MHz: 0.00129894 seconds Program ended with exit code: 0 The test suite source code is pushed here: https://github.com/John-Lluch/CPU74/blob/master/Test-Suite/TestUnits.c Tests cover all available instructions and addressing modes including edge cases and a number of use case scenarios in a 2K long machine program. However due to the relatively slowness of the logisim model, the tests do not go that far as to iterating for all the possible operand values on any given instruction. The tests can however be updated in the future to cover all possible values. For example, the 16 bit arithmetic test can be updated to iterate for all possible values of 16 bit operands, which would represent 4 thousand million additions. At the effective speed of the real processor that would still be a totally reasonable wait. The microinstruction decoding table that runs on the logisim model looks like this: https://github.com/John-Lluch/CPU74/blob/master/Simulator/LogisimSupport/DecoderRomTruthTableV10_Full.txt I have also pushed the logisim model to the github repo, so it can be found under the "Logisim" folder EDIT: I thought it could be interesting to post a visual clue about the way the logisim model looks while running the test suite. So I recorded a quick video (straingt recording of my computer screen with my phone ), and posted to youtube. https://youtu.be/eN_2K4hNeW8 (sorry for the low video quality)
Tue Dec 01, 2020 11:37 am

oldben Joined: Mon Oct 07, 2019 2:41 am Posts: 810	Re: 74xx based CPU (yet another) Testing a design is really two parts. 1st part is the the logial blocks work correctly. That is where simulation is useful. The second part is you have fault with hardware and you need a process to go from what works, to what is not working. That is something takes alot of creative thinking, with simple test programs. That is why the front panel was part of a machine until the late 1970's. I found having one is good for testing stupid mistakes with a design that goes through a lot of revisions. Having spent the wee hours of the morning debugging floating point I/O routines, with the HALT instruction I got see why the thing was not working. A typeo in temp variable was the problem. saved to temp1 but later loaded temp2. Ben. Now off the computer and to bed. PS: Real fancy front panels could do all kinds of things, like setting break points with a running program. Having a single step software trap is also useful.
Tue Dec 01, 2020 1:38 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) Hi Ben. You are making good points. In the model I already have a clock circuitry that allows it to run step by step, and two clock frequencies that can be switched on the fly without glitches. This is the clock switching circuit in the logisim model: Attachment: Clock.png The actual input 'clock' signals are represented as small square waves on the left. The clock signal that goes to the processor is the 'CLK-PH' sgnal on the right side. There's a "slow/fast" switch, a "run" switch, and a "tick" button. The "halt" input is connected to a control signal provided by the 'halt' instruction, so it stops the clock output in sync with it. I suppose that on the real thing (in case it is ever made) I will also have some way to pick the current program address and instruction, probably by means of a small arduino card polling the buses, so I can debug things while running it step by step or at a very slow frequency. You do not have the required permissions to view the files attached to this post.
Wed Dec 02, 2020 9:50 am

oldben Joined: Mon Oct 07, 2019 2:41 am Posts: 810	Re: 74xx based CPU (yet another) Having a larger TTL design with a front panel, halt just clears the run flip/flop. The microcode switches from decoding the IR register and decodes the front panel inputs. The clock is always running. The advantage is that I can display the registers when halted or in real time. The SWR is defined as IO device saves me from having more connections to and from the ALU. As it is a 100 pin connector (.125" pitch) is just ample for for the mother board.
Wed Dec 02, 2020 9:26 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) I decided to push this thing a bit more and implemented a 3 stages pipelined version of the processor (instead of 2 stages). This is something that I had in mind for some time, and with the functional test suite in place, it looked that it was the right time to attempt this. The pipeline consists on the following: Attachment: Pipeline.png It's a relatively classic implementation of a 3 stage pipeline. - Load/stores still take 2 cycles. During the first cycle the address is calculated and stored in an internal register. During the second cycle the actual write or read from memory is performed. - Taken branches now use 3 cycles (instead of 2). This is because the pipeline now has a two cycle execution latency: by the time the branch is determined to be taken, there are already 2 following instructions being processed. - Subroutine calls and returns are also affected by the two cycle latency. They now take 4 cycles instead of 3 cycles. - Read after Write (RAW) data hazards can appear, for the general purpose Registers and the SP: they are detected and solved in the usual way. - The decoder PLA is identical, except that some control signals are registered to be used one cycle later The Logisim Model is implemented and working, it fully passes the Test Suite, and the simulation already feels 50% faster. The differences are the following: - After decoding, control signals related to ALU and write back, are registered to be used on the following cycle - The ALU has registered inputs, so that the simultaneous decoding of the following instruction does not interfere execution of the current one. - The register file has hazard detection circuitry. It simply forwards the value on the alu output bus to the alu inputs if a register collision is detected. The critical path analisis shows that it now runs at 30 MHz . This is the timing diagram: https://github.com/John-Lluch/CPU74/blob/master/Docs/TimingChart-P.png Compared with the previous 2 stage pipeline version the following performance enhancements apply - Normal instructions (1 cycle) : 30 MHz / 16 MHz -> 87.5% faster - Load/Stores (2 cycles) : 30 MHz / 16 MHz -> 87.5% faster - Taken Branches (3 cycles vs 2 cycles) : (30/16) * (2/3) -> 25% faster - Subroutine calls and returns (4 cycles vs 3 cycles) : -> (30/16) * (3/4) -> 40 % faster Considering that on average of 20% of instructions are branches and 60% of them are taken, and that 5% of instructions are call/returns, and 30% are load/stores, that results in the following: * 2 stage pipeline at 16 MHz: 1/(0.122 + 0.053 + 0,32 + 0.53) = 0.66 instructions per clock pulse -> 10.5 Million instructions/ second 3 stage pipeline at 30 MHz: 1/(0.123 + 0.054 + 0.32 + 0.53) = 0.59 instructions per clock pulse -> 17.8 Million instructions/ second for an overall speed improvement of: 10.5 / 17.8 = 68.6 % faster* This is very good news considering that the implementation differences are really minimal. The top logisim circuit looks like this after it completed all the tests: https://raw.githubusercontent.com/John-Lluch/CPU74/master/Docs/LogisimDocs-P/Main.png That's pretty good and I'm really pleased with it ! Said that, there's still some room for further improvement, which I will disclose in another post. The extra gains in this case would not be that spectacular, but not minor either. Unfortunately, they would come with a non-negligible amount of complexity, so it remains to be seen if they will be worth the effort. What I have now is probably a good balance between performance and complexity, and it is already MUCH better than my initial goals, so it may be a good candidate to the final design after all. You do not have the required permissions to view the files attached to this post.
Sat Dec 05, 2020 3:25 pm

rj45 Joined: Sat Nov 28, 2020 4:18 pm Posts: 123	Re: 74xx based CPU (yet another) So, I have been watching this thread for over a year now. Thank you so much for sharing your ideas and your design process! I have been inspired by your work to also build a 16-bit CPU. My journey has been a bit different, but perhaps I will share some of that in a separate thread. But your realizations have become my realizations as well, and has sparked lots of good thought and research. So thank you! (BTW I have not copied your design, just been heavily influenced by it, I hope that is okay.) So, it's awesome to see you implementing a pipeline, and that speed boost is impressive. I also implemented a pipeline but I went with a 5 stage design that I am now regretting. I wish I had gone with a 3 stage like you from the beginning, but the amount of rework would be I think too high. I guess it's easier to add pipeline stages than to remove them. Anyway, with two extra stages there's a lot more forwarding required, and I fear there may be too many pipeline registers to make implementing in TTL practical. But we will see. One thing I am now thinking about is OS support. Things like virtual memory, memory protection, supervisor mode, etc. Do you have any thoughts about how that would look with your CPU?
Sun Dec 06, 2020 2:23 pm

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1835	Re: 74xx based CPU (yet another) Welcome rj45! It'll be interesting to hear about your inventions. Joan: thanks for sharing the results from repipelining your machine. Very educational I think, and a good result too, to see just a few dents in cycle count but a healthy improvement in cycle time.
Sun Dec 06, 2020 3:59 pm

joanlluch Joined: Fri Mar 22, 2019 8:03 am Posts: 328 Location: Girona-Catalonia	Re: 74xx based CPU (yet another) Hi rj45 Thanks for your comments. It's ok to get inspiration from this, if that's useful in some way. I got my inspiration from several sources too. I started by studying the 16 bit Ti MSP430 processor, and looked at the 8 bit AVR processors. From the latter, I got the idea of the 'carry' instructions, enabling all data widths to be processed (including comparisons) by a smaller width ALU. The idea of prefixed immediate values was given to me by someone in this forum, and later on I found that the same kind of thing was used by the Risc-V "compressed" instruction set, although with a different name. While implementing the compiler I realised about the importance of conditional instructions, other than branches, and that made me look at the ARM, and the ARM Thumb. That helped to connect the missing dots. For hardware implementation I'm getting strong influence from the work of 6502.org forum member Drass (particularly his 20 MHz C74-6502), he also helped me to understand basic hardware concepts, as I was a totally noob when I started this. About OS support, I'm still undecided. When I started this I just wanted to run "space invaders" and "basic", so no OS was required, really. But at this time I am quite confused on what to do. I have looked at this https://pdos.csail.mit.edu/6.828/2020/xv6/book-riscv-rev1.pdf https://github.com/mit-pdos/xv6-riscv, but to be honest I have to learn everything about operating systems at their core, so I'm really very far from being able to use that. I also believe that only 16 bit addressing space is not enough for anything relatively serious, such as a proper operating system with virtual memory and memory protection, so maybe I may just implement my own basic interpreter and that would be it.
Sun Dec 06, 2020 5:11 pm

Page 19 of 21

[ 305 posts ]

Go to page Previous 1 ... 16, 17, 18, 19, 20, 21 Next

74xx based CPU (yet another)

Who is online