View unanswered posts | View active topics It is currently Thu Mar 28, 2024 8:05 am



Reply to topic  [ 138 posts ]  Go to page 1, 2, 3, 4, 5 ... 10  Next
 FISA64 - 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I started a new 64-bit FPGA RISC processor project called FISA64. It is being written in Verilog.

5 stage overlapped pipeline with multi-cycle memory ops. (IF - ifetch, RF - regfetch, EX/MEM - execute memory, WB - writeback, TL - tail)
64 x 64 bit regs
1 address mode (scaled indexed with optional post increment/decrement)
40 bit instructions

_________________
Robert Finch http://www.finitron.ca


Fri Dec 26, 2014 6:46 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Attachment:
File comment: FISA64 - Pipeline diagram
FISA64 Pipeline.gif
FISA64 Pipeline.gif [ 54.34 KiB | Viewed 13632 times ]

_________________
Robert Finch http://www.finitron.ca


Fri Dec 26, 2014 6:49 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Attachment:
File comment: FISA64 - ISA
FISA64b.pdf [163.8 KiB]
Downloaded 609 times

A copy of the ISA starting out.

_________________
Robert Finch http://www.finitron.ca


Fri Dec 26, 2014 6:53 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
I'll follow this with interest!


Fri Dec 26, 2014 10:37 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I think I’ve figured out how to do an address increment / decrement during a load / store instruction without requiring two register write ports. One might think that two register ports are required one each for both the memory data and the register update. However both updates don’t have to take place at the same time. Part of the secret is an extra stage after write-back. The incremented address is fed to the write-back stage first where the memory result would go, if the memory op was complete. Then the write-back stage is advanced once, so the address update ends up in the TL stage. The TL stage allows bypassing for the address update to subsequent instructions.

The core isn't complete enough yet to do simulations, so I don't know if it'll work but that's the theory.

_________________
Robert Finch http://www.finitron.ca


Fri Dec 26, 2014 7:23 pm
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got an initial synthesis for the core done. It looks to be about 14,000 logic cells. Getting ready for simulations.
I decided to add memory indirect addressing modes. Memory indirect addressing modes are useful to store pointers to memory when the pointer is too wide to fit into a processor register. A good example of usage is in the 6502 which has only eight bit registers. In the case of FISA64 128 bit pointers to memory are stored allowing an 128 bit address space. Of course for the FPGA sample only 32 bits are used. FISA64 is using only a 32 bit bus to memory, so loading a 128 bit pointer uses a four word burst access.

_________________
Robert Finch http://www.finitron.ca


Sun Dec 28, 2014 10:06 pm
Profile WWW
User avatar

Joined: Tue Jan 15, 2013 5:43 am
Posts: 189
128-bit addresses?? :shock: That is a forward-looking specification! They say the need for address space grows continually, simply to meet the needs of progress. Still, I wonder how long the power-of-two thing will be worth clinging to. The need for address bits grows more or less linearly over time, whereas the doubling of address widths is an exponential thing -- and each doubling is more expensive than the one before. Although each doubling buys more time than the one before, and will pay off eventually, I wonder whether at some point the cost and the time frame will become absurd. After 64 bits, perhaps the market might welcome a step to 96 bits, say, rather than going straight to 128. I realize it's a challenge for the ISA architect to elegantly and efficiently manage non-power-of-two addresses. But maybe that's a challenge worth, uh, addressing! I'm babbling; sorry! :roll:

Quote:
Memory indirect addressing modes are useful to store pointers to memory when the pointer is too wide to fit into a processor register.
Hmmm. Whether it's to 96 bits or 128, to go beyond 64 you do need an answer for this. Is it not feasible to use multiple registers -- a pair -- to contain the wide pointer (rather than multiple memory locations)?

Finally, an incidental question. How do you prevent Carry propagation delays in the 64-bit ALU from hobbling the clock speed? Do the tools and FPGA fabric adequately solve this problem for you, or will you be explicitly coding remedial measures?

cheers,
Jeff

_________________
http://LaughtonElectronics.com


Mon Dec 29, 2014 5:46 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
They say the need for address space grows continually, simply to meet the needs of progress

>>IIRC, the rate of growth is something like 2 bits per year. 128 bit addressing will probably last 50-60 years then.
Apparently I don't RC. growth rate seems to be more like one bit every five years (due to some quick net research). 128 bit space should last hundreds of years then.

Quote:
128-bit addresses?? :shock: That is a forward-looking specification!

I have to admit it seems like overkill to me. It was late at night and I was looking for a reason for memory indirect addressing. I think the brain works on memory indirect addressing, so I wanted to support it with the processor. It's supported in processors like the 6502/6809/680x0 and others.

Quote:
Hmmm. Whether it's to 96 bits or 128, to go beyond 64 you do need an answer for this. Is it not feasible to use multiple registers -- a pair -- to contain the wide pointer (rather than multiple memory locations)?

Entirely feasible. It would require an extra read port on the register file though, and assuming that register N and N+1 are involved so that extra bits aren't required in the instruction to specify the register pair. There would also need to be a way to increment a pointer across two registers, but that's needed anyway for memory indirect addressing. I was also thinking of using 128/96 bit addressing with a 32 bit cpu. That would require four registers for a pointer.

Quote:
Finally, an incidental question. How do you prevent Carry propagation delays in the 64-bit ALU from hobbling the clock speed? Do the tools and FPGA fabric adequately solve this problem for you, or will you be explicitly coding remedial measures?

I guess I could say the tools solve it adequately for me. I put up with a lower maximum clock frequency. A 64 bit design is going to be somewhat slower in an FPGA than a 32 bit one because of the extra routing involved. I've not been trying to get maximum performance, but minimum development time instead. Maybe I should use a faster adder. Another solution is to go with a 32 bit computing engine.

_________________
Robert Finch http://www.finitron.ca


Last edited by robfinch on Mon Dec 29, 2014 9:46 am, edited 1 time in total.



Mon Dec 29, 2014 9:27 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
64 address bits seems to cater for a billion gigabytes - unless I missed something, that should be enough for now. So, on a 32-bit machine a double access pointer is very useful, but on a 64 bit machine it seems unnecessary. Auto-incrementing is of course useful - which is to say, some support for pointers might be good even if the pointers are single access sized.

Cheers
Ed


Mon Dec 29, 2014 9:39 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Started working on memory management for FISA64. Not planning to use either segmentation or paging for a 64/128 bit address space. Instead I'm looking at a tagged memory management system. The idea is that paging and segmentation are unnecessary for a large address space. It should be possible for most applications to find enough contiguous memory to run. I have memory divided into 64kB lots which have a single owner tag. Memory can be shared by the app which can have a number of owner tags associated with it.

Attachment:
File comment: FISA64 Memory Management
FISA64 – Memory Management.gif
FISA64 – Memory Management.gif [ 23.06 KiB | Viewed 13569 times ]

_________________
Robert Finch http://www.finitron.ca


Wed Dec 31, 2014 8:03 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Got bored :| with the classic 4 stage pipeline, and decided to try coding a superscalar version of the FISA64 processor. The superscalar version is primarily intended for simulation.
Same basic register set, same instruction set (a bit different than posted earlier).
3-way superscalar, max 3 instructions at a time, fetch3 instructions at once, queue 0 to 3 instructions, execute 0 to 4 or 5 instructions at once (2 alu, 1-fp, 1 mem)
16 entry instruction window.
Planning on clocking the register file write four or five times faster than the rest of the design, in order to get three write updates in one machine cycle..

_________________
Robert Finch http://www.finitron.ca


Tue Jan 06, 2015 8:24 pm
Profile WWW

Joined: Thu Jan 17, 2013 4:38 pm
Posts: 53
Being a fan of the 68K family I have toyed with ideas how you could add it to a RISC setting.
What I ended up with was an instruction set that isn't the usual "2 source, 1 destination", but rather 3+2 (or 4+2 if you have an offset value in the instruction). The second destination register would then be the EA of the memory address used plus any pre/post, or just pre/post for a certain register (this was just at the idea stage). The thought was to keep the *2 *4 *8 scaling from 68K.

For non-memory opcodes the 2nd destination could be used to store condition code flags. For double-width results the 2nd register could hold the other half of the result.

No idea if any of this is feasible... :-)


Wed Jan 07, 2015 12:11 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Being a fan of the 68K family I have toyed with ideas how you could add it to a RISC setting.
What I ended up with was an instruction set that isn't the usual "2 source, 1 destination", but rather 3+2 (or 4+2 if you have an offset value in the instruction). The second destination register would then be the EA of the memory address used plus any pre/post, or just pre/post for a certain register (this was just at the idea stage). The thought was to keep the *2 *4 *8 scaling from 68K.

For non-memory opcodes the 2nd destination could be used to store condition code flags. For double-width results the 2nd register could hold the other half of the result.

No idea if any of this is feasible... :-)


The 68k's one of my favorites as well, and I sometimes play with ideas for extending the processor. One of my favorites is to widen the opcode to 18 bits (or more) and add more (double the number) address / data registers.

Quote:
but rather 3+2 (or 4+2 if you

Two destination registers can be handled sequentially or in parallel by doubling the size of the register file.

One version of the register file for FISA64 replicates it 27 times to provide nine read ports and three write ports. (9x3 = 27). It works out to about 8,000 Logic cells.
Attachment:
FISA64 register file.gif
FISA64 register file.gif [ 31.19 KiB | Viewed 13530 times ]

_________________
Robert Finch http://www.finitron.ca


Sun Jan 11, 2015 2:08 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
This replication is a nice idea.


Sun Jan 11, 2015 10:49 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
This replication is a nice idea

I hope it isn't patented :) .

Just wondering about the value of supporting auto inc/dec addressing. It was fairly straightforward to implement with a pipelined processor, but now working on a superscalar version it's not so simple.

Autoinc/dec addressing support in the processor is accomplished by enqueing the increment / decrement operation as a second instruction in the instruction queue. The memory op instruction is replicated into a second que slot, then the op is changed to a special autoinc/dec op. This is about the easiest approach otherwise the instruction would require two target registers, which isn't supported by the rest of the processor. But it makes the queue and fetch logic more complex. To simplify the logic, if an memory op with autoinc/dec is present it is the only instruction queued during the clock cycle. The result is the processor is probably slower than one without autoinc/dec addressing. But it uses less code space sometimes.

The next thing to figure is how to get memory indirect modes to work.

_________________
Robert Finch http://www.finitron.ca


Wed Jan 14, 2015 2:49 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 138 posts ]  Go to page 1, 2, 3, 4, 5 ... 10  Next

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software