View unanswered posts | View active topics It is currently Fri Apr 26, 2024 11:49 pm



Reply to topic  [ 24 posts ]  Go to page Previous  1, 2
 Stuck between a hard place and a 16-bit rock. 
Author Message

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 62
DockLazy wrote:
drogon wrote:
The RISC-V version is much simpler, as that's not needed. The shift is *2 on the '816 but *4 on the RV, but that's fine if the RV has a barrel shifter (1 cycle each). If we naively assumed 1 cycle per instruction on the RV side then it's 6 cycles vs. 29,even at 2 cycles per instruction, 24 vs. 29. -snip-

So clock for clock RISC-V is interpreting (simple)bytecodes roughly as fast as an 68k or 8086 can execute native instructions! I suppose I shouldn't really be surprised, that was kind of the point of the move to RISC.


Hm. I'd never really thought of it that way, but yes.

DockLazy wrote:
Looking at the RISC-V code you posted and ignoring the 32-bit registers for a sec, but assuming a 16-bit ALU. My computer would have the same instruction count except for "inc regPC". This would be 3 instructions, low add, carry check, and add CC result to high. A 32-bit full add would be 4 instructions.


A shortfall (in my opinion) on the 65816 is the inability to load an 8-bit value from RAM into a 16-bit register and zero the top 8 bits. This is why the system is running in 16-bit memory mode (so I waste a cycle loading the next byte up because the memory is only 8-bits wide), then mask the 16-bit value (and waste another 3 cycles for the AND. Even if I dropped into 8-bit mode I'd still have to zero the top 8-bits and the instructions to drop into 8-bit mode and back again take even more time. RISC-V has a load-unsigned-byte instruction that zeros the top 3 bytes so saves time there. I do feel that's important for bytecode interpreters.

If your RAM is 16-bits wide then you'll obviously not need 2 (memory) cycles for the load but being able to load a byte into a > 8-bit register with the top bits zeroed is a good thing IMO.

The bytecode VM I'm using is really designed to execute BCPL programs efficiently. There are instructions to access global variables and stack variables and these range from short offsets (one byte) to long offsets (up to 4 bytes), so fetching un-aligned data is also key - more so when loading a 32-bit constant that may not be word aligned. There are some quite complex instructions - e.g. the switch instructions - designed to execute switch and case statements... Although these need compiler help to generate the jump index tables that are in-line with the switch opcodes. (One variant is linear like if/then/else, the other a binary tree - eat your heart out, microcode!!!) From what I gather, the instruction set was designed, then implemented in the compiler then the output analysed, some instructions dropped, some changed, etc. over many many iterations to make all the most frequently used instructions the shortest and most efficient to run - so the instruction set is really designed to run the compiler but it works well for everything else.

I have looked at writing/porting Pascal and C compilers to the bytecode - Pascal might be easier, but then so might writing a BASIC compiler, so who knows.

Cheers,

-Gordon


Wed Mar 08, 2023 1:59 pm
Profile

Joined: Sun Mar 27, 2022 12:11 am
Posts: 40
The design for the 32-bit version of this computer was a fairly classic RISC. Single cycle per instruction, Harvard, word and sub-word load/stores, fast shifter, no condition codes(the reason for the slow 32-bit math in the 16-bit version, saves chips and complexity) etc.

The 16-bit version will almost be the same except for the data path being 16-bit, instructions will hopefully still be 32-bit. There's a few things that need to be fixed for the move to 16-bit, like find a way to do 32-bit adds in two instructions.

Data alignment is aligned only. I don't think there is any cheap way of doing unaligned access. If I don't run out of space there should be a funnel shifter available though.

Of those compiler choices BASIC would be the most fun.


Thu Mar 09, 2023 12:40 pm
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
I wonder if byte code was too slow to use for a 32 bit cpu design rather than a 16 bit cpu if you wanted a address space > 64kb?
The 8088/8086 comes to mind here. Did UNIX for the 68000 run as 16 bit code or 32 bit code?
BASIC does need the TTY for the real I/O feel how ever. :)
Ben.


Thu Mar 09, 2023 5:04 pm
Profile

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 62
oldben wrote:
I wonder if byte code was too slow to use for a 32 bit cpu design rather than a 16 bit cpu if you wanted a address space > 64kb?
The 8088/8086 comes to mind here. Did UNIX for the 68000 run as 16 bit code or 32 bit code?
BASIC does need the TTY for the real I/O feel how ever. :)
Ben.


I've no idea about Unix on the 68K but I suspect it was 32-bit all the way.

The BBC Micro had a 16-bit BCPL/Cintcode VM - but maybe that's because at that time (early 80's) the BCPL compiler was only good for 16-bits? I don't know. I suspect it might have been a bit slow if it were to have gone to 32-bits - my own use of it at the time showed it to be generally 3-5 times faster than BBC Basic (which supports 4-byte integers) though.

As for BASIC needing a TTY...

https://www.youtube.com/watch?v=Bg-DcjrJ8g0

That good enough for you?

-Gordon


Thu Mar 09, 2023 6:36 pm
Profile

Joined: Sun Mar 27, 2022 12:11 am
Posts: 40
Had a look at some historical 16-bit computers to see how they solved this problem.

The PDP-11 had optional segmented virtual memory with 3 separate maps(kernel, supervisor, user). Instruction and data spaces were separated as well, so you could have 64kI and 64kD without touching the segments.

The Alto initially used 64kx16 words, later versions had 4 banks of 64kx16 words. Which wasn't enough for what they were doing.

The Dorado, successor to the Alto. Was a bit unusual as it had a 16-bit datapath but separate 'base' registers. A 28-bit virtual address was assembled by adding a base register with 16-bit value from the datapath. I get the impression the Dorado suffered a bit from design by committee.


Mon Mar 13, 2023 8:53 am
Profile

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 62
DockLazy wrote:
Had a look at some historical 16-bit computers to see how they solved this problem.

The PDP-11 had optional segmented virtual memory with 3 separate maps(kernel, supervisor, user). Instruction and data spaces were separated as well, so you could have 64kI and 64kD without touching the segments.

The Alto initially used 64kx16 words, later versions had 4 banks of 64kx16 words. Which wasn't enough for what they were doing.

The Dorado, successor to the Alto. Was a bit unusual as it had a 16-bit datapath but separate 'base' registers. A 28-bit virtual address was assembled by adding a base register with 16-bit value from the datapath. I get the impression the Dorado suffered a bit from design by committee.


I think early Prime minis had something similar but I did not enjoy using them, so their memory has faded somewhat. (ie. 16bit architecture with some form of paging so that the computer as a whole could have many MB but processes were limited to 64KB, or 64KB program and 64KB data.

The question might be why? But that's easy - cost, and expectations of what we were doing with computers in the late 60s to 70s... We didn't quite have the data sets or need for larger programs but things were snowballing - the first Prime I used had 4MB of RAM (a P550) - quite a step up from the 64KB in the Apple II's I'd been using, or the Interdata 7/32 "mini" which I'd also been using (A nice 32-bit system that I wish I'd had more time with).

Going back to your early posts in this thread ... There is a 16-bit TTL computer that's running a Unix out there:
http://www.homebrewcpu.com/
https://www.youtube.com/watch?v=0jRgpTp8pR8

Looks like it's still running and online!


and I have read of someone making a 32-bit RISC-V system in TTL.(Actually there might be a couple according to a quick google look for it)

And some folks have created a 6502 in TTL too.

So... It's all possible - just pick your era and favourite style and give it a go...

I've probably said enough about my forrays with 16-bit + segmented addresses (ie the 65C816) so won't rant on any more, but next time (which is happening) it's a 32-bit RISC-V core...

-Gordon


Mon Mar 13, 2023 9:55 am
Profile

Joined: Sun Mar 27, 2022 12:11 am
Posts: 40
drogon wrote:
I think early Prime minis had something similar but I did not enjoy using them, so their memory has faded somewhat. (ie. 16bit architecture with some form of paging so that the computer as a whole could have many MB but processes were limited to 64KB, or 64KB program and 64KB data.

The question might be why? But that's easy - cost, and expectations of what we were doing with computers in the late 60s to 70s... We didn't quite have the data sets or need for larger programs but things were snowballing - the first Prime I used had 4MB of RAM (a P550) - quite a step up from the 64KB in the Apple II's I'd been using, or the Interdata 7/32 "mini" which I'd also been using (A nice 32-bit system that I wish I'd had more time with).

I haven't heard of either of those machines before. Of those two the segmented memory system of the Prime is something I want to avoid, but the Interdata 7/32 has an interesting and very relevant design for this discussion.

Internally the 7/32 has a 16-bit datapath, registers are 32-bit but split into 16-bit halves(might only be visible to microcode, only had a quick look at the schematics), and memory addressing is 20-bits wide. The interesting part is the 16-bit ALU has a 4-bit adder extension for rapidly calculating memory addresses. Memory references are one of the most frequent instructions so it's worthwhile spending extra hardware to get the address out as quickly as possible.

The above design of a 16-bit datapath, but with fast address extension, is where I've been heading for a while now. It's just a matter of finding a good fit for all the pieces.

drogon wrote:
Going back to your early posts in this thread ... There is a 16-bit TTL computer that's running a Unix out there:
http://www.homebrewcpu.com/
https://www.youtube.com/watch?v=0jRgpTp8pR8

Looks like it's still running and online!
-snip-

Magic-1 uses the same 64k+64k per process virtual memory as the other 16-bit minis from the seventies. Bill covers some of the issues he's had with that setup in this video: https://www.youtube.com/watch?v=qOcSnaK0yBw


Tue Mar 14, 2023 9:08 am
Profile

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 62
DockLazy wrote:
drogon wrote:
I think early Prime minis had something similar but I did not enjoy using them, so their memory has faded somewhat. (ie. 16bit architecture with some form of paging so that the computer as a whole could have many MB but processes were limited to 64KB, or 64KB program and 64KB data.

The question might be why? But that's easy - cost, and expectations of what we were doing with computers in the late 60s to 70s... We didn't quite have the data sets or need for larger programs but things were snowballing - the first Prime I used had 4MB of RAM (a P550) - quite a step up from the 64KB in the Apple II's I'd been using, or the Interdata 7/32 "mini" which I'd also been using (A nice 32-bit system that I wish I'd had more time with).

I haven't heard of either of those machines before. Of those two the segmented memory system of the Prime is something I want to avoid, but the Interdata 7/32 has an interesting and very relevant design for this discussion.

Internally the 7/32 has a 16-bit datapath, registers are 32-bit but split into 16-bit halves(might only be visible to microcode, only had a quick look at the schematics), and memory addressing is 20-bits wide. The interesting part is the 16-bit ALU has a 4-bit adder extension for rapidly calculating memory addresses. Memory references are one of the most frequent instructions so it's worthwhile spending extra hardware to get the address out as quickly as possible.

The above design of a 16-bit datapath, but with fast address extension, is where I've been heading for a while now. It's just a matter of finding a good fit for all the pieces.


My expperience with the Interdata 7/32 was as a school pupil - however it was running a home-written operating system (Designed by Edinburgh University) rather than the supplied os/32. The OS was called Mouses and written in a language called Imp (Imp77 another Algol-like language sort of about/before C but C won that race) So in 78/79 I had access to what was to all intent and purposes a 32-bit multi-user system without any fancy paging, segmentation (that I was aware of) with a usable CLI interface, editor and compiler (and text layout, etc. programs) and it all just more or less worked. I'd go there one evening a week and there were about a dozen of us school kids using it and pestering the people who wrote the OS and were maintaining it. Great times.

That has some influence on my 32-bit BCPL system - retro hardware and OS, multi-tasking (although not multi-user)

I've not actually looked at the architecture of the 7/32 but reading their paper on bootstrapping it is interesting.

Some Mouses stuff here:
https://history.dcs.ed.ac.uk/archive/os/mouses/

Cheers,

-Gordon


Tue Mar 14, 2023 10:09 am
Profile

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 62
I've just looked again at Martin Richards BCPL pages and found he's still working on the compiler but not only that it can now output code for a 16-bit Cintcode engine. This appears to be an attempt to re-create the version of BCPL that existed on the BBC Microcomputer in the early 80's.

So writing a 16-bit Cintcode VM might be an interesting option for a 'true' 16-bit system - although some sort of separate I&D or Harvard architecture might not be possible as things like static variables are placed in-line with the normally read-only code... (so the compiler can output 'short' offsets to use them rather than some sort of absolute address).

http://www.cl.cam.ac.uk/users/mr10/BCPL/bcpl.tgz

-Gordon


Thu Mar 16, 2023 7:36 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 24 posts ]  Go to page Previous  1, 2

Who is online

Users browsing this forum: No registered users and 15 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software