Last visit was: Sat Jun 14, 2025 6:16 am
It is currently Sat Jun 14, 2025 6:16 am



 [ 171 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9, 10, 11, 12  Next
 rf68000 - 68k similar core 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Dealing with all the different data sizes on a 64-bit machine is problematic. I was just going to follow the x386 style and support only bytes or 64-bit ops for 64-bit mode.
The 68k stores floats using 96-bits IIRC. I was just going to support only 96-bit precision floats.
There are some extra encodings available for immediate values if the 68020+ versions of the processor are not fully supported.

_________________
Robert Finch http://www.finitron.ca


Fri Jul 12, 2024 4:46 pm WWW

Joined: Thu Jan 17, 2013 4:38 pm
Posts: 56
robfinch wrote:
Chose to put the cpu width bits in the page management table entry.[/attachment]

Ooh! I like your thinking. My ideas for a fantasy-RISC would have encodings for 2x16, 1x32, 3x42[128] and 3x42[128VLIW] (all being subsets of the next one, VLIW apart), but if you don't need to micro-optimize and switch sizes then the MMU could save bits on choosing that.

One of the things I wanted to have the MMU handle was if data are considered big or little endian. And possibly a few things involving what protection level you are running at purely based on the PC.
Oh yeah, while at it, having separate MMUs for instructions and data - hopefully free up more page bits, more parallellism&performance?


Fri Jul 12, 2024 7:57 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Quote:
Oh yeah, while at it, having separate MMUs for instructions and data -

Separate MMUs for instructions and data is an interesting idea. The page table entries could be made specific for instructions or data. Might not need all the access rights for each type, so could conserve bits. However, it may be more resource efficient to have just a single MMU.
There is just a single 128-bit bus connecting the core to the outside world, and the MMU is attached to this. It could possibly tell the difference between code and data, and manage each separately.

*****

Sketched out a 68k design using 20-bit instruction parcels. This allows for 16 data and 16 address registers. It also allows an additional operand size ‘.O’ for octabyte. The instruction set remains much the same. Branches have 12-bit or 20-bit displacements available. Added a CLRM – clear multiple registers instruction.

This was mainly just an exercise, no plans to implement. Got me started thinking about RiSCV and implementing a clear multiple registers with that processor. It could be done using a two-deep register file where one depth always contains zeros. When a register is ‘cleared’ the depth pointer could be incremented to point to a zero value on read. When a register is written the depth pointer could be decremented so the actual value is readable.
This would be using the LUT ram to multiplex a zero to the output when the register is clear.
A bitmask could be used to clear selected registers by or'ing the mask with the depth pointers.

_________________
Robert Finch http://www.finitron.ca


Sat Jul 13, 2024 10:04 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
Why do you need to clear registers?
Throws in a DCA from a old PDP8. :)


Sat Jul 13, 2024 10:37 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Quote:
Why do you need to clear registers?

I have heard that it is good for some security / crypto type apps where registers are cleared on function entry or exit.

Usually several registers need to be cleared. This can be handled using a single bit FF to indicate the register value is zero. So only a single bit need be manipulated rather than all 64.

DCA=?

_________________
Robert Finch http://www.finitron.ca


Sun Jul 14, 2024 1:34 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
It might be better to clear on function entry, in the case of a trojan subroutine.

The PDP 8 is the classic old school computer.
Copied off the web
12 bit program counter
12 bit ac + link bit
4096 words of memory

Code:
DEC's 1965 PDP-8 Pocket Reference Card
PDP 8 INSTRUCTION LIST


Mnemonic  Code          Operation                 Cycles

BASIC INSTRUCTIONS


 AND      0000  logical AND                           2
 TAD      1000  2's complement add                    2
 ISZ      2000  increment and skip if zero            2
 DCA      3000  deposit and clear AC                  2
 JMS      4000  jump to subroutine                    2
 JMP      5000  jump                                  1
 IOT      6000  in-out transfer                   2 1/2
 OPR      7000  operate                               1

GROUP 1 OPERATE MICROINSTRUCTIONS (1 CYCLE)

                                              Event Time
 NOP      7000  no operation                          1
 CLA      7200  clear AC                              1
 CLL      7100  clear link                            1
 CMA      7040  complement AC                         1
 CML      7020  complement link                       1
 RAR      7010  rotate AC and link right one          2
 RAL      7004  rotate AC and link left one           2
 RTR      7012  rotate AC and link right two          2
 RTL      7006  rotate AC and link left two           2
 IAC      7001  increment AC                          2

GROUP 2 OPERATE MICROINSTRUCTIONS (1 CYCLE)


                                              Event Time
 SMA      7500  skip on minus AC                      1
 SZA      7440  skip on zero AC                       1
 SPA      7510  skip on plus AC                       1
 SNA      7450  skip on non zero AC                   1
 SNL      7420  skip on non-zero link                 1
 SZL      7430  skip on zero link                     1
 SKP      7410  skip unconditionally                  1
 OSR      7404  inclusive OR, switch register with AC 2
 HLT      7402  halts the program                     1
 CLA      7600  clear AC                              1


Mnemonic  Code          Operation                 Cycles

COMBINED OPERATE MICROINSTRUCTIONS


 CIA      7041  complement and increment AC           1
 LAS      7604  load AC with switch register          1
 STL      7120  set link (to 1)                       1
 GLK      7204  get link (and put int AC bit 11)      1
 CLA CLL  7300  clear AC and link                     1
 CLA IAC  7201  set AC = 1                            1
 CLA CMA  7240  set AC = -1                           1
 CLL RAR  7110  shift positive number one right       1
 CLL RAL  7104  shift positive number one left        1
 CLL RTL  7106  clear link, rotate 2 left             1
 CLL RTR  7112  clear link, rotate 2 right            1
 SZA CLA  7640  skip if AC = 0, then clear AC         1
 SZA SNL  7460  skip if AC = 0, or link is 1, or both 1
 SNA CLA  7650  skip if AC /= 0, then clear AC        1
 SMA CLA  7700  skip if AC < 0, then clear AC         1
 SMA SZA  7540  skip if AC <= 0                       1
 SMA SNL  7520  skip if AC < 0 or line is 1 or both   1
 SPA SNA  7550  skip if AC > 0                        1
 SPA SZL  7530  skip if AC >= 0 and if the link is 0  1
 SPA CLA  7710  skip of AC >= 0, then clear AC        1
 SNA SZL  7470  skip if AC /= 0 and link = 0          1


|d|i|g|i|t|a|l| EQUIPMENT CORPORATION
   PRINTED IN U.S.A.    25-5/65
I


Sun Jul 14, 2024 8:55 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
The PDP8 is pretty impressive for 1965.

I started working on a more contemporary architecture. 32x64-bit regs. 8x8 bit condition registers.

_________________
Robert Finch http://www.finitron.ca


Mon Jul 15, 2024 6:46 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
What FPGA and tool chain are you using?


Tue Jul 16, 2024 3:31 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Quote:
What FPGA and tool chain are you using?
xc7a200t FPGA and free Vivado toolset. The 68k will fit into a much smaller FPGA.

The newer design, while similar to a PowerPC, uses 16-bit instruction parcels. Subroutine calls store the return address on the stack rather than in a link register. So there are no link registers. Some instructions like RTS, RTI, and SC are only 16-bit. Other instructions like CMP 64-bit immediate take up 80-bits. Subroutine calls are 48-bit to support a 42-bit routine address. Branches are 32-bit, supporting a 21-bit branch target range.

The CPU has only a single mode of operation. To handle other modes of operation multiple cores will be used.

_________________
Robert Finch http://www.finitron.ca


Wed Jul 17, 2024 3:46 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Put a modicum of work into this project defining micro-ops for a superscalar version. The micro-ops are about 70 bits wide.
The definition is for a 2r1w risc processor.
Working on how to handle all the flags updates.

_________________
Robert Finch http://www.finitron.ca


Tue May 06, 2025 6:04 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
Just what is a micro-op in this case?

It is more fun to talk with someone who doesn't use long, difficult words but rather short, easy words like, 'What about lunch?'"
—Winnie the Pooh


Tue May 06, 2025 3:25 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Quote:
ust what is a micro-op in this case?

A micro-op is similar to an instruction, but may contain other signals convenient to operate parts of the CPU. I think they are called micro-ops to avoid confusion with the term 'instructions'. A micro-op is really a RISC processor instruction used internally by modern CPUs.

The micro-op structure for the rf68000 looks like:
Code:
typedef struct packed
{
   logic [2:0] count;
   logic [2:0] num;
   logic flgdep;               // 1=depencency on flags
   updpat_e updpat;         // reg update pattern
   updpat_e fupdpat;         // flags update pattern
   logic [1:0] sz;
   logic [31:0] imm;
   logic Rs2wl;
   logic [4:0] Rs2;
   logic [4:0] Rs1;
   logic [4:0] Rd;
   logic [6:0] opcode;
} uop_t;

68000 instructions get mapped onto multiple micro-ops (RISC instructions).


The micro-op structure for the StarkCPU currently looks like this:
Code:
typedef struct packed {
   logic v;         // valid
   logic exc;      // exception eg. bad register
   logic [2:0] count;    // how many micro-ops make up the instruction
   logic [2:0] num;      // which micro-op this is
   logic [1:0] xRs3;     // extended register specifications
   logic [1:0] xRs2;
   logic [1:0] xRs1;
   logic [1:0] xRd;
   logic [3:0] xop4;     // extended instruction spec.
   instruction_t ins;     // an instruction that is from a subset of the ISA's instruction set.
} micro_op_t;



It has an instruction plus other fields to extend the register set and indicate which micro-op of the instruction it is part of.
Note the instruction field is from a subset of the instructions for StarkCPU. It was defined for convenience.

A sample instruction structure looks like:
Code:
typedef struct packed
{
   logic zero;
   logic op;
   fround_t rm;
   logic [4:0] Rs3;
   logic [4:0] Rs2;
   logic cr;
   logic [4:0] Rs1;
   logic [4:0] Rd;
   logic [5:0] opcode;
} fma_inst_t;

Which is the structure for an FMA instruction. There are about 50 different instruction formats. They are all combined into a union called 'instruction_t'
A micro-op could also be defined like this (including the fields directly):

Code:
typedef struct packed {
   logic v;         // valid
   logic exc;      // exception eg. bad register
   logic [2:0] count;    // how many micro-ops make up the instruction
   logic [2:0] num;      // which micro-op this is
   logic [1:0] xRs3;     // extended register specifications
   logic [1:0] xRs2;
   logic [1:0] xRs1;
   logic [1:0] xRd;
   logic [3:0] xop4;     // extended instruction spec.
   logic zero;
   logic op;
   fround_t rm;
   logic [4:0] Rs3;
   logic [4:0] Rs2;
   logic cr;
   logic [4:0] Rs1;
   logic [4:0] Rd;
   logic [5:0] opcode;
} fma_micro_op_t;

The 'instruction_t' abstracts things.

I was just working on the micro-op mapping for the rf68k. It looks like (there are thousands of lines of code):
Code:
// Handles mapping:
//   ADD,SUB,AND,OR,EOR
task tAlu;
input opcode_e opc;
input instruction_t ir;
input ndx_t ir2;
input ndx_t ir3;
input ndx_t ir4;
input [15:0] ir5;
input updpat_e updpat;
output uop_t [4:0] uop;
output [2:0] icount;
begin
   icount = 3'd1;
   uop[0] = {$bits(uop_t){1'b0}};
   uop[1] = {$bits(uop_t){1'b0}};
   uop[2] = {$bits(uop_t){1'b0}};
   uop[3] = {$bits(uop_t){1'b0}};
   uop[4] = {$bits(uop_t){1'b0}};
   uop[0].updpat = UPD_ALL;
   uop[1].updpat = UPD_ALL;
   uop[2].updpat = UPD_ALL;
   uop[3].updpat = UPD_ALL;
   uop[4].updpat = UPD_ALL;
   uop[0].fupdpat = UPD_NONE;
   uop[1].fupdpat = UPD_NONE;
   uop[2].fupdpat = UPD_NONE;
   uop[3].fupdpat = UPD_NONE;
   uop[4].fupdpat = UPD_NONE;
   uop[1].num = 3'd1;
   uop[2].num = 3'd2;
   uop[3].num = 3'd3;
   uop[4].num = 3'd4;
   case({ir.add.d,ir.add.m})
   4'b0000:   // Dn
      begin
         uop[0].count = 3'd1;
         uop[0].opcode = opc;
         uop[0].sz = ir.add.sz;
         uop[0].Rd = mapDn(ir.add.Dn);
         uop[0].Rs1 = mapDn(ir.add.Dn);
         uop[0].Rs2 = mapDn(ir.add.Xn);
         case(ir.add.sz)
         2'b00:   uop[0].updpat = UPD_BYTE;
         2'b01:   uop[0].updpat = UPD_WORD;
         2'b10:   uop[0].updpat = UPD_LONG;
         default:   ;
         endcase
         uop[0].fupdpat = updpat;
      end
   4'b0001:   // An
      begin
         uop[0].count = 3'd1;
         uop[0].opcode = opc;
         uop[0].sz = ir.add.sz;
         uop[0].Rd = mapDn(ir.add.Dn);
         uop[0].Rs1 = mapDn(ir.add.Dn);
         uop[0].Rs2 = mapAn(ir.add.Xn);
         case(ir.add.sz)
         2'b00:   uop[0].updpat = UPD_BYTE;
         2'b01:   uop[0].updpat = UPD_WORD;
         2'b10:   uop[0].updpat = UPD_LONG;
         default:   ;
         endcase
         uop[0].fupdpat = updpat;
      end
   4'b0010:   // (An)
      begin
         uop[0].count = 3'd2;
         uop[0].opcode = fnLoadop(ir.add.sz);
         uop[0].sz = ir.add.sz;
         uop[0].Rd = 5'd17;
         uop[0].Rs1 = mapAn(ir.add.Xn);
         uop[1].opcode = opc;
         uop[1].sz = ir.add.sz;
         uop[1].Rd = mapDn(ir.add.Dn);
         uop[1].Rs1 = mapDn(ir.add.Dn);
         uop[1].Rs2 = 5'd17;
         case(ir.add.sz)
         2'b00:   uop[1].updpat = UPD_BYTE;
         2'b01:   uop[1].updpat = UPD_WORD;
         2'b10:   uop[1].updpat = UPD_LONG;
         default:   ;
         endcase
         uop[1].fupdpat = updpat;
      end

_________________
Robert Finch http://www.finitron.ca


Wed May 07, 2025 7:41 am WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 791
Back in the old days it was called microcode. as far as the 68000 was concerned.
The 68000 is a big CISC, just to have 64Kb programs. Sadly RISC is the other way,64Meg programs.


Wed May 07, 2025 3:56 pm

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1832
In principle, an instruction can be broken into several micro-ops, each of which is executed with the usual machinery (pipeline or more complex) - but also, in some implementations two or more micro-ops can be fused into one. In some implementations two instructions could be fused into one micro-op.

So it's not quite like microcode.

See Agner Fog's writings on Intel-style microarchitectures for more: https://www.agner.org/optimize/microarchitecture.pdf


Wed May 07, 2025 5:37 pm

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2358
Location: Canada
Tonight's toy - eight networked rf68000s in a loop:
Attachment:
rf68000_network_loop.png

Nodes look like:
Attachment:
rf68000_compute_node.png

Core 2 is able to flash the system LEDs. Had the cores running at 100 MHz. But reduced it to 50 MHz. The text display output is not working correctly. Not sure yet if it is software or hardware. The CPU gets to the Delay3s routine after performing some other initializations.
The FPGA is about 66% full. Each rf68000 takes 12,000 LUTs without floating-point, 24,000 LUTs with floating-point.
Code:
...   
movec.l   coreno,d0               ; get core number
   cmpi.b #2,d0
   bne   start_other
   move.b d0,IOFocus               ; Set the IO focus in global memory
   if HAS_MMU
      bsr InitMMU                     ; Can't access anything till this is done
   endif
   bsr   InitIOPBitmap               ; not going to get far without this
   bsr   InitSemaphores
   bsr   InitRand
   bsr RandGetNum
   andi.l #$FFFFFF00,d1
   move.l d1,_canary
   movec d1,canary
   bsr   Delay3s                  ; give devices time to reset
   move.l #0,$FD000000
   move.l #0,$FD000004
   move.l #0,$FD000008
   move.l #0,$FD00000C
   move.l #0,$FD000010
   move.l #0,$FD000014
   move.l #0,$FD000018
   move.l #0,$FD00001C
   bsr   clear_screen
...


You do not have the required permissions to view the files attached to this post.

_________________
Robert Finch http://www.finitron.ca


Thu May 08, 2025 2:14 am WWW
 [ 171 posts ]  Go to page Previous  1 ... 5, 6, 7, 8, 9, 10, 11, 12  Next

Who is online

Users browsing this forum: alrj, claudebot and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software