View unanswered posts | View active topics It is currently Thu Mar 28, 2024 4:35 pm



Reply to topic  [ 53 posts ]  Go to page Previous  1, 2, 3, 4
 Suite-16 (formerly Bitslice using currently available TTL) 
Author Message

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
Good to hear about your new collaborator!


Mon Nov 04, 2019 10:11 am
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 157
Yes,

Two minds clearly better than one! And Frank's TASM contribution has come at just the right stage in the project.

After about 2 weeks since Hello World!" - I think that I have probably hand-assembled enough code to prove what can be done, and also highlight any deficiencies in the instruction set.

The ADI and SBI instructions have really made operations on the accumulator much easier, and without them I probably would have struggled even more on the hexadecimal entry routine. It would also make sense to have immediate 8-bit versions of AND, OR and XOR working on the accumulator - as there is virtually no hardware overhead for their implementation.

Frank also pointed out the convenience of a SWAP instruction, to assist with byte handling and packing within 16-bit wide memory.

This week, I hope to implement these instructional changes, and include the "PDP-8 style" OPR instructions to allow for shifts, clears, complements on the accumulator. SWAP will probably also be part of this group.

By the end of the week I hope to have finalised the instruction set within the simulator, and begin the process of working towards it's implementation in hardware.


Mon Nov 04, 2019 11:01 am
Profile

Joined: Mon Aug 14, 2017 8:23 am
Posts: 157
Hi All - and Happy New Year - although we are already more than half way through January.

This post is by way of an update, because I realise that the Suite-16 project has been fairly quiet since early November.

In mid-November I went over to California for a couple of weeks - firstly to attend Forth Day 2019, held at Stanford University near Palo Alto, and then to have a road trip along the Pacific Coast Highway in a 1996 Cadillac Sedan de Ville.....

I did a short presentation on "Forth on the Gigatron TTL Computer", and presented Charles Moore with a Gigatron kit.

Frank, my team-mate joined the project on November 4th 2019 - and brought along a PC version of the simulator and an assembler based on TASM.

With his software experience and an improved toolchain, plus someone to bounce ideas off, we have been making steady progress in our spare time over the last couple of months.

In the intervening weeks, Frank has written a hex-loader routine to run on the simulator - so that it can be loaded from the hex output from the TASM assembler.

The simulator runs either on a PC under windows, or on a $25 Nucleo STM32H743 dev board. This latter runs at about 8 million simulated instructions per second - which is close to 2/3rds of the desired speed of the proposed TTL implementation.

For more information, on the assembler (TASM) and software simulation running under windows on PC you might wish to check out my colleague Frank Eggink's repository:

https://github.com/frankeggink/Suite-16-Assember and https://github.com/frankeggink/Suite-16-Emulator

Recent Progress:

The original aim was to create a novel 16-bit processor, as a learning exercise, and offer it as a simulation running on PC or Nucleo board, an FPGA implementation and as a retro-computer built from real 74xx00 series TTL chips.

The instruction set and C simulator were mostly complete (we thought) by early November 2019, and a few hand assembled routines were coded up to prove that decimal and hex number input and output could be reliably achieved over a serial interface. Which they can - and tested at 921600 baud with TeraTerm

As of mid January, we have a fairly stable simulator with hexloader, which can execute hexadecimal and decimal number entry, conversion and serial printout of numbers and text-strings.

The original idea of having the accumulator as the only destination register has proven to be too inflexible, and with the use of a register file of 16 general purpose registers, this restriction is really no longer an issue.

We are actively finding several different ways of using the lower byte of the instruction register - which we call the payload or the immediate byte. If you look at a hex-dump of some of our number conversion routines, this byte is almost always unused - leading to poor code density. I'm looking at using it for 8-bit immediate constants, intra-page addressing, and even extending the memory addressing space to 24-bits.

The simulator is written in C and reduces the 31 instructions down to just two large switch-case statements. I wrote it in this manner in the hope that the conversion from a C simulator to a verilog implementation for an FPGA would be fairly straightforward.

The whole simulator is only about 75 lines of code, and can be built with the hex-loader already initialised into program memory. Below in the code window is the heart of it - based on two switch-case statements.

The first decodes the 4 bit opcode IR[15:12], and the second decodes IR[11:8] which handles the conditional jumps and some I/O operations.

The 8-bit field IR[7:0] is used as a local branch within the current page as PCM - or as an 8-bit immediate value addr - which is used to modify the accumulator R0. We are currently devising more uses for this underutilsed 8-bit immediate field.

Code:

// Opcode Execute */
  switch (op) {
  case 0x0:   break;
  case 0x1:   R[n] = M[PC]; PC++; break;  // SET
  case 0x2:   R[0] = R[n];        break;  // LD
  case 0x3:   R[n] = R[0];        break;  // ST
  case 0x4:   R[0] = M[R[n]];     break;  // LD@
  case 0x5:   M[R[n]] = R[0];     break;  // ST@
  case 0x6:   R[0] = M[R[n]]; R[n] = R[n] + 1;  break; // POP with post-increment of pointer Rn 
  case 0x7:   R[n] = R[n] - 1; M[R[n]] = R[0];  break; // PSH with pre-decrement of pointer Rn
  case 0x8:   R[0] &= R[n];       break;  // AND
  case 0x9:   R[0] |= R[n];       break;  // OR 
  case 0xA:   R[0] += R[n];       break;  // ADD
  case 0xB:   R[0] -= R[n];       break;  // SUB
  case 0xC:   R[n] = ~R[n];       break;  // INV
  case 0xD:   R[n] = R[n] - 1;    break;  // DEC
  case 0xE:   R[n] = R[n] + 1;    break;  // INC
  case 0xF:   R[0] ^= R[n];       break;  // XOR
  default: break;
  }

  // Conditional Branches and I/O Group

  A = (int16_t)R[0];   // A is the contents of the accumulator R0 - used for testing for conditional branches

  if (op == 0) { // do an unconditional jump back to enclosed address

    switch (n) {
    case 0x0:  PC = PCM;                 break;   // BRA Branch Always
    case 0x1:  if (A > 0) { PC = PCM; }  break;   // BGT Branch if Greater
    case 0x2:  if (A < 0) { PC = PCM; }  break;   // BLT Branch if Less Than
    case 0x3:  if (A >= 0) { PC = PCM; } break;   // BGE Branch if Greater or Equal
    case 0x4:  if (A <= 0) { PC = PCM; } break;   // BLE Branch if Less Than or Equal
    case 0x5:  if (A != 0) { PC = PCM; } break;   // BNE Branch if Not Equal to zero
    case 0x6:  if (A == 0) { PC = PCM; } break;   // BEQ Branch if Equal to zero
    case 0x7:  PC = M[PC];               break;   // 16-bit JMP
    case 0x8:  R[15] = R[15] - 1; M[R[15]] = PC+1; PC = M[PC]; break;   // CALL (16-bit) use R15 as Return Stack Pointer
    case 0x9:  PC = M[R[15]]; R[15] = R[15] + 1; break;        // RET
    case 0xA:  R[0] = R[0] + addr;       break;   // ADI add the immediate 8-bit contained in the address field
    case 0xB:  R[0] = R[0] - addr;       break;   // SBI subtract the immediate 8-bit contained in the address field                     
    case 0xC:  Serial.write((uint8_t)R[0]);  break; // OUT  - output a character to the Serial port
    case 0xD:
    {
        while (!Serial.available()) {
          ; // wait
        }
        char ch = Serial.read();
        Serial.write(ch);
    M[512] = (uint8_t)ch;
    break;
    }
    case 0xE:  PC = R[0]; R[0] = M[PC]; break;  // JMP @R0   - useful for indexing and table look-up ( curious but useful pipeline effect here)
    case 0xF:  break; R[0] &= R[0];             // NOP   AND accumulator with itself
    default: break;
    }
  }




Finalising the Instruction Set

The initial concept was to have a trivial instruction set loosely based on Steve Wozniak's "Sweet 16" 16-bit virtual machine that ran on the early 6502 based Apple machines.

We have a reduced instruction set of just 31 basic instructions, operating on a bank of 16, 16-bit registers. We are looking to utilise a paged addressing scheme which will allow up to a 24-bit wide address apace to be accessed, or 16M words.

The instruction set has evolved slowly over the months - especially when we started using it to try to do real text and number handling routines.

Some new instructions have been added to make it much easier to code with, more compact and much more flexible than the original ISA concept.

We are now finalising the instruction set before we progress with the next phase of the project which will be the implementation of the processor, memory and serial UART within an opensource FPGA development board.

We have decided to go straight to FPGA at this stage, as it will be the fastest route to having a couple of stable prototypes for thrashing out the hardware and software.

I also think that an FPGA implementation might be a bit more relevant to many of today's enthusiasts.

The TTL retro-computer can come at a later stage, once we are further along with the project. It's not a cop-out, it's just leaving the hardest and most time consuming part until later in the project.

Also, I will probably be more useful to the modern world with FPGA and verilog skills rather than TTL design skills...... ;)

Whilst FPGA hardware can achieve near miracles in logic design, I am keeping in mind that I ultimately wish to implement Suite-16 as a TTL Retro-Computer - so I am really trying to keep things simple and not too far removed from what can be done efficiently using a minimum of TTL ICs.

Some Results:

On the 400MHz STM32H743 Nucleo board (about £25), the simulator runs at about 8 million simulated instructions per second.

https://hackaday.io/project/168025-suit ... g-suite-16

Using TeraTerm and Frank's hex-loader, you can send a hex file to the STM32H743 simulator at 921600 baud.


Fri Jan 17, 2020 8:39 pm
Profile

Joined: Sun Jul 05, 2020 9:00 pm
Posts: 17
I have a suggestion. Why not implement a carry-skip adder if you are doing it in TTL? In that case, for the upper nybble (if 8-bit, nybbles if 16-bit), you can use an additional adder and a multiplexer. That way, the upper nybble is added for both conditions and the carry bit from the low nybble selects which of the 2 already-added outputs is used. So one upper nybble adder has the carry-in tied to Vcc and the other to Ground. The carry-out from the low nybble drives the multiplexer. All other connections are the same as before, just with 2 adders.


Sun Jul 05, 2020 9:15 pm
Profile

Joined: Tue Dec 18, 2018 11:25 am
Posts: 43
Location: Hampshire, UK.
An ingenious idea, but I wonder if the two extra ICs would be faster than a single 74F182 Look-ahead Carry Generator IC.


Sun Jul 05, 2020 9:45 pm
Profile

Joined: Sun Jul 05, 2020 9:00 pm
Posts: 17
Well, it depends on how fast you need the upper nybble to be stable. That is essentially Dieter's idea. So use 3 '182s and a multiplexer to add a byte. And both possibilities for the high nybble are added at the same time, with the multiplexer to select which high adder goes on the bus, reducing propagation time. So even for the standard Gigatron, this could possibly increase stability for faster speeds since it would reduce carry propagation times across 2 fast adders.

That is the problem even with CLA adders. You still have to ripple the carry between chips. So you have to wait the time of the low nybble and then the time of the high nybble to settle. But if you take both conditions into account, you can do switching faster than the full propagation time.


Mon Jul 06, 2020 8:48 am
Profile

Joined: Mon Oct 07, 2019 2:41 am
Posts: 585
With TTL ripple carry is good for 16 bits or less. 74Fxxx's are still around with about 10 ns carry
delay per 4 bits .
It really amounts to floorplaning just what is for fpga what is faster. To get full speed you need to be able
access the hardware directly, and FPGA companies do not permit that.
The other factor is external memory is often SDRAM that is 16 bits wide. That tends make
external memory access a tad slower for random and wide data. A simple design
(6502/Z80) often tends to emulate a classic machine (apple II cp/m) runs at the old
speeds, thus a faster adder is not needed.
Pepino is a nice starter fpga, but it is out of stock at the moment.


Mon Jul 06, 2020 6:54 pm
Profile

Joined: Sun Jul 05, 2020 9:00 pm
Posts: 17
B.Bibby wrote:
An ingenious idea, but I wonder if the two extra ICs would be faster than a single 74F182 Look-ahead Carry Generator IC.


I don't know. I forgot about the carry generator chip. That might work better for higher numbers of bits. But at just 8, using 3 adders and a fast mux can be faster since you only have the switching delay instead of extra propagation time, reducing the latency skew between the 2 nibbles.

There used to be a 16-bit ALU chip, and they could be chained for 32 or higher bits, but the carry generator chip speeds things up and can help your speed if you want to go wider.


Sat Nov 12, 2022 6:09 pm
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 53 posts ]  Go to page Previous  1, 2, 3, 4

Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software