| 
    
        | Last visit was: Sun Oct 26, 2025 3:58 pm 
 | It is currently Sun Oct 26, 2025 3:58 pm 
 |  
 
 
 
	
			
	
	 Some thoughts on the Z80 and other microprocessors 
        
        
            | Author | Message |  
			| BigEd 
					Joined: Wed Jan 09, 2013 6:54 pm
 Posts: 1843
   | . I was reading one of a short series of personal takes on the history of some well-loved microprocessors - see full list below - and mulling over comments on the Z80. One thing about the Z80 which is difficult for those familiar with the 6502 (and its spiritual antecedent, the 6800) is that the micro is clocked much faster than the memory. It's not the case that a 4MHz Z80 has four times the performance of a 1MHz 6502, although these were widely encountered as the standard speeds of the respective part. But neither is it that case that the performance difference exactly tracks the memory speed (it takes 3 ticks of the Z80 clock for a memory access cycle.) So here's my thought, in the hypothetical situation of designing a Z80 as a successor to the 6502: - the 6502 aimed to be an incremental improvement on 6800 and hit a lower price, so it's smaller and simpler. It keeps up performance by being more cycle-efficient. - the Z80 had a larger transistor budget and could reach higher clock speed, but had only the same memory bandwidth. What to do? Have more registers, to decrease memory traffic, and a more complex instruction set, to make more use of each byte fetched. Similarly, we can view the ARM design as a reaction to needing to design a very simple machine, therefore a simply-decoded instruction format with fixed length instructions, and to make maximum use of memory bandwidth by having lots of registers. (It also has a feature of making good use of faster access for sequential addresses.) Anyhow, here are the articles, written by 'litwr' who is a (silent) member here:
 
 |  
			| Fri Jan 04, 2019 5:40 pm |  |  
		|  |  
			| litwr 
					Joined: Thu Mar 08, 2018 10:24 am
 Posts: 13
   | The 6800 and 6502 are very different chips.  The 6800 has two accumulators and full support for signed arithmetic but it has only one index register, poor addressing modes and it works slower with memory than the 6502.  The 6800 uses Big Endian byte order and the 6502 Little Endian.IMHO If the 6502 could support wait states then we could add 6 bytes data cache (as for registers of 8080/8085/Z80 B,C,D,E,H,L) for it and a prefetch instruction queue for about 4 bytes.  For the memory access we can provide a burst mode.  Such enhanced 6502 @4MHz could use memory @1MHz and provide performance maybe about the Z80 @10MHz. The belated z80 upgrade the Z800/Z280 which appeared only at 1986 uses 256 bytes cache and the mentioned burst mode... Maybe the R800 has even larger cache.
 The Z80 has only several actually powerful instructions like LDIR.  The similar instructions were added for the 68516.  The 6502 has more complex addressing modes than the z80.  However several z80's instructions can be a real challenge for implementations in RISC-like one clock timing.
 IMHO it is much easier to design the super-6502 compatible with the 6502 but 5-7 faster at the same frequency.  We need to move zero page into a register file and widen the data bus.  The latter allows to use pipelining.  Indeed we also need to use cycle optimization like it was done at the 4510.
 
 
 
    							Last edited by litwr on Mon Jan 07, 2019 8:47 pm, edited 2 times in total. 
 
 |  
			| Mon Jan 07, 2019 7:38 pm |  |  
		|  |  
			| BigEd 
					Joined: Wed Jan 09, 2013 6:54 pm
 Posts: 1843
   | Good to hear from you! My thinking on the 6502 as a successor to 6800 is that the MOS team came directly from the 6800 team in Motorola and that the 6502 chip floorplan bears quite a resemblance to the 6800. So, the team as a whole took some of their knowledge with them, and wanted - indeed needed - to make something "better" than the 6800. You are of course right that the architecture and instruction set of 6502 takes some different directions from the 6800. The pin-compatibility of the 6501 is another aspect, which justifies making a close comparison.
 It would be interesting to consider a 6502 which uses more transistors to improve performance - as you say, some tiny data cache and/or instruction buffer might make quite a difference. But I think the microarchitecture would need to be quite different to make use of the facilities, so we'd be talking about a full re-implementation and not just a tweak. My feeling is that the 6502 team didn't have the deep understanding of computer architecture to be able to do this. Indeed, very few people would, at that time, and they probably worked for IBM.
 
 
 |  
			| Mon Jan 07, 2019 8:16 pm |  |  
		|  |  
			| litwr 
					Joined: Thu Mar 08, 2018 10:24 am
 Posts: 13
   | I don't think that making a better 6502 is so difficult.  The mentioned Z800/Z280 implemented the bust mode, a higher clock frequency for CPU than for memory.  For example, the R800 in the MSX TurboR works at 28 MHz but uses 7 MHz when accessing memory.  The z80 has much more instructions that 6502 and it makes their fast realization very difficult. It can  be easier for the 6502's ISA. IMHO just moving ZP to a register file gives about 2 times performance boost.[6501] It is interesting for me that according to Bill Mensch's Oral History the idea of making 6501 more cheap and effective was rejected because it made minor incompatibility with the 6800's pin layout.  IMHO they really teased Motorola a bit and got a reaction...
 
 
 |  
			| Tue Jan 08, 2019 10:06 am |  |  
		|  |  
			| litwr 
					Joined: Thu Mar 08, 2018 10:24 am
 Posts: 13
   | I have updated materials about processors.  Especially much has been added to the 6502, Z80, 68k, and x86 chapters.  The completely new chapter about the IBM/360 compatible mainframes has been also added - https://litwr.livejournal.com/3576.html As usually I will be glad to get any new information about processors in question.  Thank you
 
 |  
			| Fri Sep 04, 2020 7:48 am |  |  
		|  |  
			| barrym95838 
					Joined: Tue Dec 31, 2013 2:01 am
 Posts: 116
 Location: Sacramento, CA, United States
   | I'm about 40 years late with this aimless musing, but I think that turning the 6502 zero-page into a separate and "pure" internal register file  (disjoint from RAM addresses $00xx) might have been an attractive method to save a cycle or two for the zp zp,x zp,y (zp,x) and (zp),y addressing modes.  It would have added a couple of thousand transistors to the die, but AIUI that was not beyond the capabilities of the late 1970s.  The biggest drawback would have been breaking existing programs that took advantage of the zp/abs overlap by performing a JMP or JSR to $00xx after depositing self-modifying executable code there.  Assemblers would also have to be modified to discern between register file accesses and normal RAM accesses in the $00xx range (sta !$20 vs. sta $0020).  Ah, the wanderings of a weary mind ... 
 
 |  
			| Sat Sep 05, 2020 5:03 am |  |  
		|  |  
			| litwr 
					Joined: Thu Mar 08, 2018 10:24 am
 Posts: 13
   | barrym95838 wrote: I'm about 40 years late with this aimless musing, but I think that turning the 6502 zero-page into a separate and "pure" internal register file  (disjoint from RAM addresses $00xx) might have been an attractive method to save a cycle or two for the zp zp,x zp,y (zp,x) and (zp),y addressing modes.  It would have added a couple of thousand transistors to the die, but AIUI that was not beyond the capabilities of the late 1970s.  The biggest drawback would have been breaking existing programs that took advantage of the zp/abs overlap by performing a JMP or JSR to $00xx after depositing self-modifying executable code there.  Assemblers would also have to be modified to discern between register file accesses and normal RAM accesses in the $00xx range (sta !$20 vs. sta $0020).  Ah, the wanderings of a weary mind ...Thank you very much. Indeed it is a drawback but IMHO only few programs used such tricks. IMHO moving zp into registers and doing the 4510 style optimizing could have made the 6502 2-3 times faster.  Pipelining could have made it 2-3 times faster too.  So we could have had the 4-9 times faster 6502 by the mid 80s. It was also worth to add a memory copy command like that that was made for the 65816, BIT #n, and maybe several more others. The overall effect could have been more than a 10-fold increase in productivity.
 
 |  
			| Sat Sep 05, 2020 5:45 pm |  |  
		|  |  
			| Garth 
					Joined: Tue Dec 11, 2012 8:03 am
 Posts: 285
 Location: California
   | litwr wrote: IMHO moving zp into registers and doing the 4510 style optimizing could have made the 6502 2-3 times faster.My 65816 Forth runs 2-3 times as fast as my 6502 Forth at a given clock rate, partly since it can handle 16 bits at a time, and partly because of the added instructions and addressing modes.  The 816's greater code density allows making many more of words to be primitives (ie, written in assembly language), whereas on the '02, this would just take too many instructions and too much memory. Quote: Pipelining could have made it 2-3 times faster too.The 6502 does have minor pipelining, making for few dead bus cycles.  Newcomers are sometimes baffled by the fact that for example ADC #<a_constant> takes only two cycles, or in Z80 lingo, two T-states. Quote: It was also worth [adding] a memory-copy command like that that was made for the 65816, BIT #n, and maybe several more others.The 816's MVP and MVN are nice, although it seems like they could have cut out a cycle or two per byte moved and still kept the interruptibility.  Perhaps they could have cut out yet a couple more cycles per byte if they had added a couple of registers for the bank numbers so they didn't have to be reloaded every time, and that would fix the self-modifying code requirement too.  As it is, the bank numbers are operands in the MVP and MVN instructions; so unless those are fixed, you'll have to use SMC. The 65c02 does have a BIT#.  For the many improvements of the 65c02 over the 6502, see http://wilsonminesco.com/NMOS-CMOSdif/  .  The '816 is of course a logical upgrade with a bazillion benefits, even if you ignore the upper 8 bits of the 24-bit address.  About having ZP as registers, the '816 allows putting ZP (now called "Direct Page") anywhere in the first 64K of the memory map.  It doesn't even have to start on a page boundary (although it's one cycle more efficient if you do start it on a page boundary).  One implication is that each task can have its own ZP (or DP).  Another is that you can make the DP overlap the stack area, allowing all the DP addressing modes to be used in the stack as well (although the '816 also adds nice stack-relative addressing modes)._________________http://WilsonMinesCo.com/ lots of 6502 resources 
 
 |  
			| Sat Sep 05, 2020 7:19 pm |   |  
		|  |  
			| oldben 
					Joined: Mon Oct 07, 2019 2:41 am
 Posts: 853
   | 8 bitters have only opcode space for one data size. Byte operands.Not much more can be improved on them. Playing around with 9 and
 10 byte cpu's (FPGA) you still only have a micro controler style computer.
 Index reg, stack pointer, accumulator, and program counter.
 We all know that 32KB of memory was never practical as general purpose
 computer. Computers started out as 36 bits, and chopping them down to 32
 bits I belive was a step backwards. You got 18 bits for addressing and room
 for 8 or 16 registers all in one opcode word. Floating point had good size.
 Ben.
 
 
 |  
			| Sat Sep 05, 2020 8:18 pm |  |  
		|  |  
			| litwr 
					Joined: Thu Mar 08, 2018 10:24 am
 Posts: 13
   | oldben wrote: 8 bitters have only opcode space for one data size. Byte operands.Not much more can be improved on them. Playing around with 9 and
 10 byte cpu's (FPGA) you still only have a micro controler style computer.
 Index reg, stack pointer, accumulator, and program counter.
 We all know that 32KB of memory was never practical as general purpose
 computer. Computers started out as 36 bits, and chopping them down to 32
 bits I belive was a step backwards. You got 18 bits for addressing and room
 for 8 or 16 registers all in one opcode word. Floating point had good size.
 Ben.
IMHO the idea of modes can easily solve the aforementioned data type problem. We can just switch the 6502 to 16-, 32-, or even 64-bit mode and this automatically turns the accumulator into a proper size.  The 65816 uses this idea quite well.  Another example is the x86.
 
 |  
			| Thu Sep 24, 2020 6:03 pm |  |  
		|  |  
			| oldben 
					Joined: Mon Oct 07, 2019 2:41 am
 Posts: 853
   | But then than makes it almost the same as 9 bit cpu, if you need a mode byte. Time to design a 36 bit cpu.   The best 8 bitter is Hitachi 6309, it has all but a large address space.
 
 |  
			| Thu Sep 24, 2020 6:48 pm |  |  
		|  |  
			| robfinch 
					Joined: Sat Feb 02, 2013 9:40 am
 Posts: 2405
 Location: Canada
   | I'm tempted to go for a 40 or possibly 48 bitter. It's appealing to have more than 32-bits but less than 64.g-core, a 52-bit machine with 13 bit bytes turned out not too bad. Works okay in sim, but not in a FPGA.
 Maybe a 48-bit machine with 12-bit bytes, then a 3x12bit 36-bit ISA.
 _________________Robert Finch   http://www.finitron.ca 
 
 |  
			| Fri Sep 25, 2020 4:26 am |   |  
		|  |  
			| oldben 
					Joined: Mon Oct 07, 2019 2:41 am
 Posts: 853
   | LS TTL gets me a .625 us memory cycle, with 74Fxx 2 x the speed. Do to hardware limitaions the FPGA version has a .6 us memory cycle as it has to read byte sized memory. This gets me into the 286 era, 1975 to 1990 for speed, for a home brew 36 bit cpu.Ben.
 
 
 |  
			| Sat Sep 26, 2020 5:49 pm |  |  
		|  |  
			| hanso 
					Joined: Mon Oct 19, 2020 2:09 pm
 Posts: 3
   | I read the articles on microprocessors and found them interesting! Some remarks on the VAX-11 architecture. I have been involved heavily in the internals of VAX/VMS starting at 1983 as teacher of Learning Services and system programmer, all while working at DEC.  Working on Wolfpack, fond memories    The VAX-11 instruction set was designed in parallel with the operating system VMS and many constructs (multiprogramming, high-level language support) hint at that. It also had to be PDP-11 (hence Virtual Address eXtension to -11) compatible, so that makes it also more complicated.  The result may seem very extended but it led to a system that not only performed well, and supported general libraries with calling conventions shared among the programming languages. Case, call, array processing,  call stack frames, virtual memory, paging, all in the instruction set. It was the first time in DEC software engineers were involved from the beginning in the VAX/VMS creation.  So the VAX-11 should be seen as an essential part of the whole: VAX/VMS.
 
 |  
			| Mon Oct 19, 2020 2:32 pm |  |  
		|  |  
			| oldben 
					Joined: Mon Oct 07, 2019 2:41 am
 Posts: 853
   | I had less happy times with the VAX around the same time frame.Not in Computer Science, no VAX for you. WE have too many users as is.
 I suspect in 5 years it was replaced
 with PC's for the computer department.
 Ben.
 
 
 |  
			| Mon Oct 19, 2020 8:33 pm |  |  
 
	
		| Who is online |  
		| Users browsing this forum: Applebot, Chrome-11x-bots, DotBot, PetalBot and 0 guests |  
 
	|  | You cannot post new topics in this forum You cannot reply to topics in this forum
 You cannot edit your posts in this forum
 You cannot delete your posts in this forum
 You cannot post attachments in this forum
 
 |  
 |