AnyCPU - View topic - GF-RV16 - an experimental 16-bit RISC-V ISA

robfinch

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2423
Location: Canada

Re: GF-RV16 - an experimental 16-bit RISC-V ISA

I have been following along on your project. I liked to see the priority you have given to testing. Testing is a lot of the work. I am used to working with FPGAs. But I think CPLDs are similar.

I have found that large elsif trees tend to generate large priority tree logic during synthesis that is chained together resulting in a lot of hardware that is slow. Synthesis may optimize it or it may not depending on how good a job it does. It is often better to use a case statement where possible as that tends not to generate cascaded priority logic. Verilog’s case statement (casez) supports don’t care bits. I suspect the same is available in VHDL.

Fewer micro-code entry points could be used, and some of the lower order micro-code address bits would not need to be specified if micro-code routines were spaced out on a power-of-two address. Could four bits of the opcode be used directly as part of the micro-code address? It would save on the decode logic. If ROMs (EPROMs) are used for micro-code they tend to be available in larger sizes so some of the low order bits can be wasted.

To get rid of some of the cases, you could have branches to the same two micro-code addresses (true and false) for all the conditional branch instructions. Using the branch outcome as a bit determining the micro-code address. Then have the condition as an input to the module determining the branch outcome. For ALU operations the same micro-code address could be used as well, with the operation code as input to the module.

_________________
Robert Finch http://www.finitron.ca

Sun Nov 09, 2025 3:41 pm

oldben

Joined: Mon Oct 07, 2019 2:41 am
Posts: 866

Re: GF-RV16 - an experimental 16-bit RISC-V ISA

gfoot wrote:

I hadn't thought of branching in the microcode itself. I think I want to avoid it though because I prefer to reduce the number of different sources that something like the microcode address pointer gets loaded from. Maybe that's something to come back to later though.

I did look for instructions that were identical to the final sequences of other instructions, but "jump" was the only one I spotted - being the same as the last few microcode entries for some of the branch instructions.

I have my microcode in a large CPLD, and use WinCUPL. Anything is better than Verlog or VHDL as I like to work at the gate level logic. a.d=a&c#b&!c;a.clk=clk1;
What I have done is I have microcode assembler that reads a text file and creates the the control logic file.
@ toggles pass data through. if not passing data though it expects microcode to assemble.
It then appends to the output,logic tables for ROM lookup, when done.I use a 256x16 rom.
This way I can keep my microcode defines, separate from the microcode decode logic.

Code:

*
  OCT 17 2025
  18 bit classic computer.
  This is a generic micocode source a 18 bit computer with
  byte addressible memory.
  This part is microcode assembler (micode.exe) 
*

@  // enter CPLD logic compiler
/*
  OCT 17 - KAL,KAH,BUS
  18 bit classic computer Swr version. Single IRQ trap.
  This is a generic micocode source a 18 bit computer with
  byte addressible memory. This part contains the pal logic
  and the rom lookup tables. 2901 ALU order code.No restart from HLT
  unless followed by a nop. 3x clock 14.3 Mhz ~ .85 uS.
  IRQ included. 74lS189,74H04,74HC14 
 
 
  ######____##
  ___#####____
   ><          CONTROL/MAR
   ########### RD (STROBE)

  QUICK 102 macro cells

  CPU model
     18          9          1
    +---+---+---+---+---+---+
    |           :           | 0 A GP A  E
    +-----------------------+ 
    |           :           | 1 B GP B  F
    +-----------------------+ 
    |           :           | 2 C GP C  G
    +-----------------------+ 
    |           :           | 3 D GP D  H
    +-----------------------+ 


    +---+---+---+---+---+---+
    |           :           | 4 #/Z (PC) I
    +-----------------------+ 
    |           :           | 5 X J
    +-----------------------+ 
    |           :           | 6 Y K
    +-----------------------+ 
    |           :           | 7 S L
    +-----------------------+ 
  


    +-+ 
    | | CF  CARRY FLAG  SET IF CARRY  / A,B,C,D
    +-+                 SET ON SHIFT OUTPUT    
                     

    ALU FUNCTIONS        AC   IX  CC   SHIFT 
    #   OPCODE  FUNC    
    0   ST/STB  STORE    Ae   A  0     0     
    1   ADD/ADC ADD      Bf   B  z     0     
    2   SUB/SBC SUB A-B  Cg   C  s     C     
    3   CAD/CAC SUB B-A  Dh   D  s+z   S     
    4   AND     AND      Pi #/Z  ~c    0     
    5   OR      OR       Xj   X  ~c+z  0     
    6   XOR     XOR      Yk   Y  true  C      
    7   LD/LDB  LOAD     Sl   S  true  S     

     E..L LOAD STORE BASE REGS

     876 543 210 987 654 321      
    +---+---+---+---+---+---+
    |1YO|OOO:AAA|320|XXX:+##| AUTO 
    +-----------------------+  
    |1XO|OOO:AAA|321|XXX:+##| INDEXED 
    +-----------------------+  
    |0YO|OOO:AAA|###|###:###| BYTE/CTL 2 hlt 0 di 1 ei
    +-----------------------+
    |1-O|OOO:AAA|001|###:...| SHIFT 0..7 is limit
    +-----------------------+ counter incriments
       
           ST      OP    
     0     SWR     REG%
     1     NOP dsp SFT #
     2     JSR     JCC+ 
     3     LEA     SCC  
     4     R+      R+  
     5     X+#     X+# 
     6     R+      R+  
     7     X+#     X+# 
   
     TRAP PUSH PC, PC = AT 2 
*/

// PROPERTY Atmel {pin_keep = off };
//   PROPERTY Atmel {soft_buffer = on};
// QUICK
 
NAME CTL.PLD ;
PARTNO ;
DATE  2025-10-17;
REVISION ;
DESIGNER ;
COMPANY ;
ASSEMBLY ;
LOCATION  ;
DEVICE F1508PLCC84 ;

PINNODE     = [OP4..1,PSW,SYNC,BUSY,BOUNCE,RS,PAN1,PAN2,GO];
PINNODE     = [LCTR,CC,CF,WCY,SFT,TS,CT,TSFT];
PINNODE     = [SX,LD,TWO,WRD,OP,RA,RX,BYY];
PINNODE     = [Y,AUX,EF,RK,EFC,TSW,BY,WR,RD]; 
PINNODE     = [CTR3..1,CTL,TST,IQ,IRQ,NO,DSP,ODD,YT];
PINNODE     = [IR18..2,TIR,TIN,TMAR,G2..1,TP,RUN,BANK];



PIN 2       = ACK;  // PHASE 2 CLOCK (AP IN)
PIN 4       = K1;   // DATA MULX
PIN 5       = K2;   // DATA MULX
PIN 70      = K3;   // DATA MULX
                        
PIN 8       = CP;   // PHASE 1 CLOCK OUT
PIN 9       = SHO;  // SHIFT OUT -L


PIN 25      = IRQI; // IRQ IN ACTIVE HIGH FROM 74LS14
PIN 1       = CLR;  // RESET ACTIVE HIGH FROM 74LS14

PIN 11      = SH1;  // RAM REG   00 LOAD 01 LOAD UNSIGNED 
PIN 12      = SH2;  //           1X SHIFT
PIN 63      = C18;  // CARRY FROM ALU

PIN 6       = APP;   // ADDRESS CLOCK  
PIN 67      = ALU3; // ALU FUNC 
PIN 65      = ALU2; // ALU FUNC
PIN 64      = ALU1; // ALU FUNC
/*
00- ADD 01- SUB 
100 AND 101 OR 
110 XOR 111 LOAD 
*/
PIN 27      = SWR1; // BOT   ALD  -L 
PIN 28      = SWR2; // BOT+1 EXAM -L
PIN 29      = SWR3; // TOP-1 DEP  -L
PIN 30      = SWR4; // TOP   R/S  -L
PIN 31      = SWR5; // PC/AC  

// UN A SIGNED NODES

PINNODE     = AD0..7; // AD7..AD0 MICRO CODE ROM ADDRESS
PIN         = IR;  // TRUE IF INSTRUCTION FETCH
//PIN         = STOP;
PIN 33      = BI18; 
PIN 34      = BI17; 
PIN 35      = BI16; 
PIN 36      = BI15; 
PIN 37      = BI14;
PIN 39      = BI13;
PIN 40      = BI12; 
PIN 41      = BI11; 
PIN 44      = BI10;
PIN 45      = BI9; 
PIN 46      = BI8;
PIN 48      = BI7; 
PIN 49      = BI6;
PIN 50      = BI5; 
PIN 51      = BI4;  
PIN 57      = YY;  // Y MULX FOR MAR,JAM 0 SUM 1 RAM REG 
PIN 54      = MR;  // MEMORY REQUEST
PIN 55      = MW;  // MEM WRITE 
PIN 56      = MB;  // MEM BYTE  
PIN 52      = DK;   // DCLOCK ACTIVE HIGH READ PANEL SWITCH
                    // DISPLAY MAR
PIN 60      = MAR;  // LOAD MAR REG AND WR,BY FLAGS
PIN 61      = IN;   // LOAD INPUT
PIN 71      = CY0;  // CARRY OUT
PIN 79      = WE_;  // 219 WRITE STROBE 
PIN 68      = NSGN; // SIGN BIT RAM -L
PIN 73      = BOUT; // DATA BUS OUTPUT ENABLE  WR&MR
PIN 74      = RG3;  // 219 RAM AD 3
PIN 75      = RG2;  // 219 RAM AD 2
PIN 76      = RG1;  // 219 RAM AD 1
PIN 77      = RG4;  // 219 RAM AD 4
PIN 58      = EQ;   // EQ FLAG IN
PIN 69      = NODD; // ODD BIT FROM RAM -L
PIN 81      = CI;   // 4X CLOCK 
PIN 83      = CLK;  // PHASE 1 CLOCK (CP IN )


/*
 MICRO CODE STUFF
 ROM A OUTPUTS
 SH,NO,LD,RA,TWO,WRD,OP,WR

          "SX", 0X0000C  3 bit data sign extended
          "WRD",0X00004  word data
          "2"  ,0X00008  constant 2
          "OP", 0X00002  alu opcode
          "SUB",0X00002  subtract
          "WR", 0X00001  WR/rd bus flag
          "AC" ,0X00010  select ac
          "LD" ,0X00020  load operation
          "NO" ,0X00040  no load ram

 BY,RX,XIR,IN,RD,MAR,Y,AUX

          "BY", 0X08000  BY/wrd bus flag
          "Y",  0X00200  select sum/ram to mar
          "IR", 0X03000  fetch
          "IN" ,0X01000  input data
          "CTL",0X00100  control state AUX !RA !RX decode IX2
          "TST",0X00100  test acc and jump if true to 4 AUX AX
          "DSP",0X00100  display mar and read swr AUX RX                      
          "SFT",0X00100  shift a input
          "PC" ,0X00000  pc select
          "MAR",0X00400  load mar
          "IX" ,0X04000  rx index reg select
          "RD", 0X00800  memory request
          "SWR",0X04010  ac/pc switch display
        
*/


IR.D        = TIR;  // intruction fetch
IR.CK       = ACK;

IN.D        = TIN;  // load b input reg
IN.CK       = ACK;

SFT         = TSFT;


MAR.D       = TMAR; // load mar register
MAR.CK      = ACK;

// alu operation
ST          = !IR15&!IR14&!IR13; 
OP4         = IR16;
OP3         = IR15;
OP2         = IR14;
OP1         = IR13;



// front panel switches and irq

PSW.D       = SWR5;
PSW.CK      = CLK;


SYNC.D      = (!SWR1#!SWR2#!SWR3#!SWR4)&!CLR;
SYNC.CK     = CLK;
SYNC.CE     = IR;
SYNC.AR     = CLR;

BUSY.D      = SYNC&!CLR;
BUSY.CK     = CLK;
BUSY.CE     = IR;
BUSY.AR     = CLR;


BOUNCE.D    = BUSY;
BOUNCE.CK   = CLK;
BOUNCE.CE   = IR;
BOUNCE.AR   = CLR;

OK          = SYNC&BUSY&!BOUNCE;

RS.D        = !SWR4&OK;
RS.CK       = CLK;
RS.CE       = IR;
RS.AR       = CLR;

PAN1.D      = OK&(!SWR1#!SWR3);
PAN1.CK     = CLK;
PAN1.CE     = IR;
PAN1.AR     = CLR;

PAN2.D      = OK&(!SWR2#!SWR3);
PAN2.CK     = CLK;
PAN2.CE     = IR;
PAN2.AR     = CLR;
// false panel true running

RUN.D       = RUN & !STOP
            # !RUN & RS;

RUN.CK      = CLK;
RUN.CE      = IR;
RUN.AR      = CLR;


TP          = EF & !AUX & IQ & RUN & IR ;

/*
   MAIN IRQ ENABLE 00 01 10 10 up
                      10 01 00 down
 */


EF.D        = CTL & OP4 
             #!CTL & RUN & EF & !TP;   
EF.CK       = CLK;
EF.CE       = IR;
EF.AR       = CLR;

/* READ IRQ ON FALLING EDGE OF CLOCK */



IRQ.D       = IRQI & RUN & !AUX ;  
IRQ.CK      = CLK;
IRQ.CE      = IR;
IRQ.AR      = CLR;

IQ.D        = IRQ & RUN & !AUX ;  
IQ.CK       = CLK;
IQ.CE       = IR;
IQ.AR       = CLR;


BANK        = !OP1&!OP2&!OP3&OP4
            #  OP1& OP2& OP3&OP4;

// SELECT ALU REGISTER 

RG1          = RA&!RX & IR10 & RUN
            # !RA& RX & IR4;
    
         
RG2         = RA&!RX & IR11 & RUN
            # !RA& RX & IR5;
     
          
RG3         =  RA&!RX & IR12 & RUN
            # !RA& RX & IR6
            # !RA&!RX // PC
            #  RA&RX&PSW;  // PC,AC


// SELECT BANK EFGHIJKL 
RG4         =  RA&!RX & BANK & RUN
            # !RA& RX & NO & IR17;

//  WORD 
ZLD       = !RA&RX  & IR6 & !IR5 & !IR4 & NO & K1 & !K2 &!IR17; 
 
CTL     = AUX & !RA  & !RX ;
TST     = AUX & RA; 




// FRONT PANEL  DK 
DK          = DSP;
DSP.CK      = CI;
DSP.D       = AUX & RX & !CLR & !CP;  
DSP.AR      = CLR;


BOUT        = MW&MR&!CLR;  


// INTERNAL TIMING AND FLIP FLOPS
// RAM STROBE 

WE_         = !(!NO&!CLK&ACK);

G1.D        = !G1&!G2&!CLR;                     
G2.D        = G1&!CLR;
G1.CK       = CI;
G2.CK       = CI;

//  INTERNAL CLOCK GEN 4 PHASE 
//      START HIGH
CP.CK       = CI;
CP.AP       = CLR;
CP.CE       = G2;
CP.D        = !APP;
// SET HIGH ON CLEAR

APP.CK      = CI;
APP.AR      = CLR;
APP.CE      = G2;
APP.D       = CP&!CLR;

// MEMORY CONTROL 
// G BUS BYTE REQUEST 
MB.D        = BY;  
MB.CK       = CLK;
MB.AR       = CLR;
// WR/rd BUS REQUEST
MW.D        = WR;  
MW.CK       = CLK;
MW.AR       = CLR;

// MEMORY REQ STROBE

 
MR.D        = RD; 
MR.CK       = ACK;
MR.AR       = CLR;


STOP        = CTL&YT
            # RUN&!SWR1&SYNC
            # RUN&RS;


LCTR        =  TST &(CC$OP4)// NORMAL TEST
            # TST & OP3&OP2; // TRUE TEST
YT          = IR17;

YY.D        = !RX & Y  //  SELECT RAM FOR OUTPUT PC++,AC++
            #  RX & Y & !YT; // RAM FOR OUTPUT R++ 
YY.CK       = ACK;

CY0.D       = OP4 & OP & !OP3 &  OP2  & CF
            # !OP4 &  OP & !OP3 & OP2   //  SUB OPS
            # OP  & !OP3 &!OP2 &!OP1; 


CY0.CK      = ACK;         

WCY         =  OP & !OP3 & (OP2#OP1) & !IR12; // ABCD ONLY
         

BYY         = RUN&(!IR18#IR8);
    
K1          = WRD;
K2          = TWO;

K3          = K1&!K2&!(BYY&OP); 


// WORD K1' SGN IN K2' 
 

ALU1.CK     = ACK;
ALU2.CK     = ACK;
ALU3.CK     = ACK;
ALU1.D      = LD # ZLD 
            #  OP &OP1;

                       
ALU2.D      = LD # ZLD 
            # OP & !OP3 &!OP2 &!OP1 // SUBTRACT
            # OP & OP2  ;
         
ALU3.D      = LD #ZLD # OP3&OP; 

SH1         = SFT & OP3; // DOWN SHIFT RIGHT
         
// SHIFT 0..7
SH2         = SFT ;
INC         = SFT & !(IR6&IR5&IR4);

SHO         = OP2&OP1&NSGN // SHIFT OUT SIGN
            # OP2&!OP1&CF;
    

TS.D        = NSGN;
TS.CK       = ACK;
//          TEMP SIGN FLAG RAM 18
ODD.D       = NODD;
ODD.CK      = ACK;


CT          = SFT&TS&!OP3      // SHIFT UP, CF = SIGN
            # SFT&ODD&OP3 // SHIFT DOWN, CF = ODD
            #!SFT&!WCY&CF
            #!SFT&WCY&C18;

CF.D        = CT;
CF.CK       = CLK;        
CC          = (
            //  REGULAR CC

             OP1&EQ     
            # OP2&TS
            # OP3&!CF
                       
            );


$REPEAT N   = [7..18]

IR{N}.CK    = CLK;
IR{N}.CE    = IR;

$REPEND

IR4.CK      = CLK;
IR5.CK      = CLK;
IR6.CK      = CLK;


IR4.CE      = IR # INC;
IR5.CE      = IR # INC;
IR6.CE      = IR # INC;


IR4.D       = !BI4& !INC # !IR4&INC;
IR5.D       = !BI5& !INC # (IR5$IR4)&INC;
IR6.D       = !BI6& !INC # (IR6$(IR5&IR4))&INC;

// AC SET ON TRAP
$REPEAT N   = [10..12]
IR{N}.D     = !BI{N}#TP;
$REPEND
// TRAP DON'T CARE
$REPEAT N   = [7..9]
IR{N}.D     = !BI{N};
$REPEND
// TRAP CLEAR FOR CTL OPCODE
$REPEAT N   = [13..18]
IR{N}.D     = !BI{N}&!TP;
$REPEND

$REPEAT N   = [3..1]

CTR{N}.CK   = CLK;
CTR{N}.AR   = CLR;
AD{N-1}     = CT{N};
   
$REPEND

CT1         = CTR1;
CT2         = CTR2;
CT3         = CTR3;
           

CTR1.D      = !LCTR & !IR & !CT1 & !INC ;
           
CTR2.D      = !LCTR & !IR & (CT2$CT1) 
            # TP;
CTR3.D      = !IR & (CT3$(CT1&CT2))
            # LCTR;  

AD3         = !RUN & PAN1 
            #  RUN & IR7 & IR18;
            
       
AD4         = !RUN & PAN2 
            #  RUN & IR8                         
            #  RUN & !IR18;
           
AD5         =  RUN & IR9 
            #  RUN & !IR18;
          
AD6         = !IR15&!IR14&!IR13#!RUN;
// STORE
AD7         = !RUN#!IR18; // PANEL,QUICK

@      
/ START OF MICRO CODE

    #000        / REG%   REGISTER OP'S   
    IX NO       MAR IN   / XXXX
    PC 2 Y      MAR
    AC  OP WRD  IR  RD

    #010        / SHIFT 0..7 1..8 TIMES
                / IR6..4 IS INCRIMENTED N-1 TIMES
    AC SFT      MAR / SHIFT A
    PC 2  Y     MAR
    PC          IR RD

    #020    /   JCC
    IX  SX  Y   MAR         
    AC  TST     RD IN    
    PC  2 Y     MAR     
    AC          IR RD
    
    #024   /  MET CC 
    PC LD WRD   MAR 
    PC 2        IR RD


    #030   /    SCC  AC = #
    AC  TST   MAR      
    PC  2 Y     MAR      
    AC  LD      IR RD
    #034   /  MET CC              
    PC  2 Y     MAR 
    AC LD SX    IR RD 
 
    #040   /   R+  
    IX SX Y     MAR   
    PC          RD IN
    PC 2 Y      MAR 
    AC  OP WRD  IR RD

    #050    /   R 
    PC 2 Y      MAR   
    PC          RD IN  
    IX NO WRD   MAR   
    PC          RD IN
    PC 2 Y      MAR
    AC  OP WRD  IR RD

    #060   /   R+  
    IX SX Y     MAR   BY
    PC          RD IN BY
    PC 2 Y      MAR 
    AC WRD OP   IR RD

    #070    /   R 
    PC 2 Y      MAR   
    PC          RD IN  
    IX NO WRD   MAR   BY
    PC          RD IN BY 
    PC 2 Y      MAR
    AC OP WRD   IR RD

/   STORE OPS

    #110 / NOP  (DISPLAY)
    AC          MAR
    DSP         IN
    PC 2 Y      MAR
    PC          IR RD
 
    #100  / READ SWR
    AC          MAR
    DSP         IN
    PC 2 Y      MAR
    AC LD WRD   IR RD
 

    #120  / JSR -S R+      
    IX SX Y     MAR   
    PC          RD IN 
    AC 2 SUB    MAR  WR              
    PC          RD   WR
    PC LD WRD   MAR 
    PC  2       IR RD 

    #130  /LEA 
    PC 2 Y      MAR
    PC          RD IN  
    IX NO WRD   IN MAR 
    PC 2 Y      MAR 
    AC LD WRD   IR RD 

    #140   /    R+  
    IX SX Y     MAR WR
    AC          RD  WR
    PC 2 Y      MAR 
    PC          IR RD

    #150    /    R  
    PC 2 Y      MAR
    PC          RD IN  
    IX NO WRD   MAR WR 
    AC          RD WR 
    PC 2 Y      MAR 
    PC          IR RD

    #160   /    R+  
    IX SX Y     MAR WR BY
    AC          RD  WR BY
    PC 2 Y      MAR 
    PC          IR RD

    #170    /    R   BYTE
    PC 2 Y      MAR
    PC          RD IN  
    IX NO WRD   MAR WR BY
    AC          RD WR BY 
    PC 2 Y      MAR 
    PC          IR RD

    #260        /QUICK
    PC 2 Y      MAR
    AC OP WRD     IR RD

    #360       / CTL  
    PC 2  Y      MAR
    CTL          IR  RD
/   TRAP
/   PUSH PC PC = @(2)

    PC SUB 2    
    AC SUB 2    MAR WR
    PC          RD  WR
    PC 2 LD     MAR
    PC          RD IN
    PC LD WRD          /WRAP AROUND
  


/ FRONT PANEL

    #300   / IDLE  

    SWR         MAR
    DSP         IN      / DATA IN HERE
    AC          
    PC          IR IN   / TOGGLE IN OFF   

    #310   / LOAD ADR  
    PC          LD WRD     
    AC          LD WRD               
    PC         
    PC          IR  
        
    #320   / READ MEM -> AC
    PC 2 Y      MAR          
    AC          RD IN         
    AC LD  WRD  
    AC          IR

    #330   / WRITE MEM -> AC
    AC WRD    LD 
    PC 2 Y    MAR WR
    AC        RD  WR       
    PC        IR

@                                                                                  
/*
                                                                                  
                     S    S   G A     V A C  C G     W V R R R                      
                     H    H C N P K K C C L  L N C   E C G G G                      
                     1    O P D P 2 1 C K R  K D I   _ C 4 1 2                      
                    -------------------------------------------                     
                   / 11   9   7   5   3   1  83  81  79  77  75 \                  
                  /    10   8   6   4   2  84  82  80  78  76    \                 
             SH2 | 12                    (*)                   74 | RG3             
             VCC | 13                                          73 | BOUT            
              IR | 14                                          72 | GND             
                 | 15                                          71 | CY0             
                 | 16                                          70 | K3              
                 | 17                                          69 | NODD            
                 | 18                                          68 | NSGN            
             GND | 19                                          67 | ALU3            
                 | 20                                          66 | VCC             
                 | 21                                          65 | ALU2            
                 | 22                 ATF1508                  64 | ALU1            
                 | 23               84-Lead PLCC               63 | C18             
                 | 24                                          62 |                 
            IRQI | 25                                          61 | IN              
             VCC | 26                                          60 | MAR             
            SWR1 | 27                                          59 | GND             
            SWR2 | 28                                          58 | EQ              
            SWR3 | 29                                          57 | YY              
            SWR4 | 30                                          56 | MB              
            SWR5 | 31                                          55 | MW              
             GND | 32                                          54 | MR              
                  \     34  36  38  40  42  44  46  48  50  52   /                 
                   \  33  35  37  39  41  43  45  47  49  51  53/                  
                     --------------------------------------------                     
                      B B B B B V B B B G V B B B G B B B B D V                     
                      I I I I I C I I I N C I I I N I I I I K C                     
                      1 1 1 1 1 C 1 1 1 D C 1 9 8 D 7 6 5 4   C                     
                      8 7 6 5 4   3 2 1     0                                       

                                                        
*/

@                                                                      
           
/ ALL DONE

Sun Nov 09, 2025 10:52 pm

gfoot

Joined: Sat Oct 04, 2025 10:54 am
Posts: 25

Re: GF-RV16 - an experimental 16-bit RISC-V ISA

BigEd wrote:

> The work involved in generating the extended bits is a little heavier than I'd like - e.g. a "less than" comparison basically involves performing a three-bit subtraction. It is on the critical path, though

Hmm, I did wonder about that. But a three bit subtraction here is just a one bit function of 6 input bits - I wonder to what degree a synthesiser could crunch that down into a small fast implementation. (It's a different problem in CPLD, where we use minterms, than in FPGA, where we have LUTs, and a 64 bit mini-rom can be essentially one unit delay.)

I ran it through a Quine-McCluskey simplifier I've been using to see what it could do - here are the product terms it came up with for comparing two 3-bit values:

Code:

    lo  hi
    00- --1
    00- -1-
    0-- -11
    0-- 1--
    -0- 1-1
    -0- 11-
    --- 111

If "lo < hi" one of these terms will match, and if "lo > hi" then none of them will match. If "lo == hi" then a term may or may not match - it doesn't matter for my purposes so the algorithm is allowed to go whichever way results in fewer product terms in the end.

This simplifier might still not be doing the best job, it's not something I know a lot about.

I've been using it to analyse the complexity of my microcode (https://raw.githubusercontent.com/gfoot ... rocode.txt). I have a little under 256 distinct microcode addresses, and it looks like about 30 bits' worth of control signal information coming out. Encoding those would require 4 8-bit EPROMs, or 2 16-bit ones. I expect it would all fit in one CPLD. Using ATF22V10s instead, though, is limiting due to the number of product terms supported for each output, and that's what I wanted to analyse.

For each signal that I need, I made my program loop through the microcode and gather the addresses where the signal should be 1, and separately the addresses where I don't care what value the signal takes. It then feeds these into the simplifier and outputs the number of product terms that remain, along with an estimate of how many macrocells it will take to compute those product terms:

Code:

ALUOP0:    2  18
ALUOP1:    3  34
ALUOP2:    3  25
ALUOP3:    1   6
HIGH:      4  42
IF:        4  45
MEMW:      1   2
PCW:       2  15
MARW:      3  31
ENDZ:      3  29
ENDNZ:     3  31
REGW_SRC0: 2  14
REGW_SRC1: 1   2
REGW:      2  22
B_MARNR:   1   7
B_MEMR:    1   3
B_PCR:     2  14
B_REGR:    3  35
B_ZERO:    2  21
REGSEL_W0: 1   6
REGSEL_W1: 1   8
REGSEL_W2: 1   3
REGSEL_R0: 2  15
REGSEL_R1: 1  12
REGSEL_R2: 1   3
BUS_A0:    2  20
BUS_A1:    2  20
BUS_A2:    2  24
BUS_A3:    2  21

Each of these is a single bit. The ones with numbers work together as you'd expect, e.g. REGSEL_W[0..2] form a 3-bit enum value identifying which register to write to (not a register number), and ALUOP[0..3] specify an ALU operation, which takes four bits. These will get decoded further externally.

I thought it was interesting to see which terms were particularly expensive. Something like "HIGH" is fairly randomly set or clear on each microcode instruction, and is never "don't care". "IF" is also often set and never "don't care". I guess ~45 product terms is probably just what it takes to encode a random bit value across the 8-bit input range. Other signals that are sometimes "don't care" are a lot cheaper, and also signals that are frequently zero - like ALUOP3 - are much cheaper as well.

B_REGR is set whenever REGSEL_R[0..2] has a specific value; when REGSEL_R[0..2] are "don't care", B_REGR is zero. B_REGR ends up a lot more expensive than REGSEL_R[0..2] are individually. I did try using a specific value of REGSEL_R[0..2] to mean "don't read a register" instead of using a separate signal (B_REGR) but that made all the bits of REGSEL_R more expensive. I think having an explicit extra bit, and more "don't cares", gives the simplifier more options.

I adjusted the ALUOP enum order to reduce the number of product terms required for its bits - not very methodically but it made a big difference. There may be more opportunities to reorder other enums but it's a bit hard to predict and the numbers depend a lot on the specific microcode content.

Overall though, it looks like if I wanted to decode this using ATF22V10s it'd need about six of them (each one has 10 output pins, and the numbers in the first column add up to about 60 macrocells). I had hoped it could be done with fewer, but it's not too bad, and the EPROM option would require several EPROMs so isn't necessarily any better. EPROMs are also slower.

So this is a bit inconclusive at the moment - still, maybe good enough! And an interesting journey with Quine-McCulskey's algorithm. I would like to get this working using simpler devices than CPLDs if possible, but at the same time this would be an easy fit for a CPLD so maybe a future iteration can be based on that.

Mon Nov 10, 2025 1:52 pm

gfoot

Joined: Sat Oct 04, 2025 10:54 am
Posts: 25

Re: GF-RV16 - an experimental 16-bit RISC-V ISA

I haven't had much time lately, but today I did spend a bit of time writing some code to try to pack the microcode decoding into ATF22V10s. Each device has 10 macrocells - two with 8 product terms, two with 10, two with 12, two with 14, and two with 16. My code is probably not optimal, but it does successfully find a packing of the various bits I currently have defined into 5 devices:

Code:

      pin 14     pin 23     pin 15     pin 22     pin 16     pin 21     pin 17     pin 20     pin 18     pin 19

  0   ALUOP0     B_PCR      ALUOP0     B_PCR      !ALUOP1    !ALUOP1    !ALUOP1    !REGSEL_R0 !IF        !IF
  1   B_MARNR    REGSEL_W0  HIGH       REGSEL_R1  HIGH       MARW       HIGH       MARW       ENDZ       ENDZ
  2   REGSEL_W1  ALUOP3     PCW        PCW        A_CONST2   !REGW_SRC0 ENDNZ      REGW       ENDNZ      REGW
  3   REGW_SRC1  REGSEL_W2  A_MAR      A_MAR      REGSEL_R2  A_MARSX    A_CONST0   ALUOP2     A_CONST0   ALUOP2
  4   MEMW       B_MEMR     !B_REGR    B_ZERO     !B_REGR    A_IMM      !B_REGR    A_IMM      A_CONST1   B_ZERO

The leftmost columns are 8-term pins, the rightmost ones are 16-term pins.

Some of the signals require multiple macrocells chained together - these always have to be on the same device as each other of course. There are quite a lot of spare product terms here, but as all of the macrocells are used, in this particular case I think this is the best that can be done. In general I think my packing code is probably not always going to find the best packing though.

I made the packing code run the Quine-McCluskey algorithm twice - once coding for 1s and again coding for 0s - and pick whichever resulted in fewer product terms. This is why some signals have a ! before them - it means that in the PLD they'll be calculated inverted, and re-inverted at the output pin.

The code also looks for opportunities to share macrocells between output signals, but doesn't yet apply these optimisations. For example it finds than ENDZ and ENDNZ have 18 product terms in common, and together they currently use four macrocells; if instead one macrocell was allocated to 16 of the common product terms, they could both use that one along with one other, meaning they'd only need three in total. This could probably save 5 or 6 macrocells overall I think.

This is all very dependent on the exact contents of the microcode, and the order in which the instructions appear in the microcode, so changes there could push this over the edge.

We were talking about sparse encodings before, and I think Rob suggested feeding some bits from the encoded instruction straight through, so I also prototyped the effect this would have on the microcode decoder. I currently have about 240 active microcode entries, with 8-bit microcode addresses. To apply Rob's suggestion, I considered using 12-bit microcode addresses, with the top 4 bits coming directly from the encoded instruction, the next 4 coming from the instruction decoder, and the bottom 4 always starting at 0 for each instruction. This spreads the microcode out over the 12 bit range (4096 total addresses) and unused addresses are passed to the Quine-McCluskey simplifier as "don't cares" for all bits.

The QM simplifier runs much more slowly, but the result is that a lot fewer product terms are required overall - another example of sparser data allowing simpler decoding. The result then fits into one fewer ATF22V10 device:

Code:

      pin 14     pin 23     pin 15     pin 22     pin 16     pin 21     pin 17     pin 20     pin 18     pin 19

  0   !REGW_SRC0 B_MARNR    MARW       REGSEL_W0  PCW        MARW       ALUOP1     ALUOP1     ALUOP0     IF
  1   !REGSEL_W2 ALUOP3     REGSEL_W1  REGSEL_R2  !ENDZ      !ENDZ      HIGH       HIGH       B_ZERO     !REGSEL_R0
  2   REGW       A_MARSX    !ENDNZ     !B_REGR    !ENDNZ     !B_REGR    A_IMM      A_CONST1   A_MAR      REGW
  3   MEMW       REGW_SRC1  A_CONST0   ALUOP2     A_CONST0   ALUOP2     B_PCR      REGSEL_R1  A_CONST2   B_MEMR

This addressing method would also reduce the complexity of the microcode address counter, as only the low 4 bits would need to be able to count up. And as Rob said, it should make the instruction decoder a bit simpler as well, as it will have fewer bits to output now.

I'm hoping I didn't miss something here - I'll need to get the results into VHDL and add some tests to confirm that it really works.

Does anybody know of freely-available tools that can do this kind of packing automatically? Can VHDL/Verilog do this for you, and write out the final product terms? The code I've written here seems to work OK but if there are good off-the-shelf tools for it then I'd like to try them too.

Sun Nov 16, 2025 3:51 am

gfoot

Joined: Sat Oct 04, 2025 10:54 am
Posts: 25

Re: GF-RV16 - an experimental 16-bit RISC-V ISA

Yes it's probably unusual to bother trying to fit things into multiples of these devices and I can imagine that tools for fitting across devices probably assume that you are at least using more capable devices in the first place!

However, applying known techniques to unusual-shaped problems is something generative AI can be quite good at, so I thought I'd give it a go:
https://gemini.google.com/share/79b521f65e7b

It did OK - its first attempt was pretty good, it immediately stated pretty much all the considerations I had also identified, and was able to split large numbers of product terms across marcocells and fit them into devices bearing in mind that the macrocells don't all have the same widths of product term inputs.

But it didn't try to keep all the macrocells concerned with each output on the same device, so it would have required cross-device wiring, extra input pins, and more latency. I asked it to change that, and it seemed to start making mistakes which I had to spot and correct a few times. If I didn't already know there was a better solution, I might not have spotted the miscounting error. But it does seem to have ended up with a viable mapping in the end.

It also sounds like if I gave it the product terms it might be able to also find cases where intermediate results can be shared between output terms, though that's another layer of complexity that would be a lot harder to spot errors in.

Edit - looking again it seems to have got the pin assignments wrong, e.g. claiming that pin 23 supports 16 PTs.

Sun Nov 16, 2025 11:05 am

GF-RV16 - an experimental 16-bit RISC-V ISA

Who is online