View unanswered posts | View active topics It is currently Thu Mar 28, 2024 7:45 pm



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 15, 16, 17, 18, 19, 20, 21 ... 52  Next
 Thor Core / FT64 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I got the clock gating logic in place.

Made the core more configurable. Parts of the core not needed may now be left out. Added a second fpu. Well I tried running the cpu in a minimal configuration and it ran about 85% as fast as a full configuration. This is due to the test program which is zeroing out a bunch of memory. All the stores cause the processor to stall more in the full configuration. A write buffer might help.
So, I wrote a simple write buffer. With the write buffer for the test it ran about 35% faster.
I decided to try the icache hammer test. Well it crashed. I traced the problem to a word in the icache that is being loaded with zeros. I haven’t figured this out yet. The word is in the middle of the cache line, and the rest of the line loads okay.

_________________
Robert Finch http://www.finitron.ca


Mon Sep 17, 2018 3:58 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
I am in favour of write buffers!


Mon Sep 17, 2018 9:03 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Figured out the middle word problem. It was software, a sync instruction was needed after invalidating the cache to allow preceding store operations to complete. The cache was being loaded from an incompletely updated source.
Code:
   ; Now jump to the test code
      cache   #3,[r1]         ; invalidate the cache

   ; The following is important to allow the last few store
   ; operations to complete before trying to execute code.
      sync

      jal      r29,[r1]
      ldi      r2,#14         ; this is the value that should be returned

I was able to trim some bits off the cache line length. While five words (320 bits) are loaded into the cache line only 288 bits are really needed.
Fixed up the RSB a little bit. It’s slightly better at predicting the return address now, though it still doesn’t work the way it should.

Ran some timing experiments (runs of 601 us)
With a minimal core (1IDU, 1ALU, 1MEM, no FCU, no Write Buf, 1 CMT path): 2200 insts.
Adding a WB: 2285 insns
Adding bypass networks: 2285 insns
Adding FCU : 2270 insns
Adding 2nd CMT path: 2358 insns
Adding 2nd IDU: 6910 insns
Adding 2nd MEM: 6911 insns
Adding 2nd ALU: 6911 insns

The biggest impact on performance seems to be having a second instruction decoder. With only one decoder the core is limited to one instruction per clock max. Interesting is with the flow control unit branch predictor, RSB enabled the number of instructions executed actually went down.

_________________
Robert Finch http://www.finitron.ca


Tue Sep 18, 2018 3:24 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I tried synthesizing the core again after numerous changes, and got a bit of a surprise. The core size is down to 62,000 LC’s from 210,000 LCs. There’s only a couple of reasons I can think of a) synthesis didn’t work properly (the core sims okay) – this is unlikely. or b) the drastic size reduction is due to the implementation of a more efficient instruction decoder. Previously for decode I used a bunch of functions all over the place (a simple first effort). For instance the IsStore() function was used in about six places. Functions cause hardware to be instanced. I cleaned up the core so that most of the functions are called in only once per decoder (two decoders) places and voila it’s much smaller now.
I think code bloat is still expensive hardware-wise. The core’s small enough now to fit into a smaller FPGA :)
I had to drop the clock gating on functional units; the design couldn’t be placed with all the clock gates.

_________________
Robert Finch http://www.finitron.ca


Wed Sep 19, 2018 2:48 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
That's a big win!

With a design this big, it must be worthwhile to track the synth results on a hierarchical basis - you may lose out on the dissolving of boundaries, but win on the information gained about what's costing you.

Over on 6502.org we once compared a number of 6502 cores, and there was quite a large variation in size - all of them, of course, are meeting the same spec. In fact, a 6x variation.
http://forum.6502.org/viewtopic.php?f=10&t=1673&p=12227


Wed Sep 19, 2018 7:00 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
With a design this big, it must be worthwhile to track the synth results on a hierarchical basis -

I don't think there's way to track synth hierarchically. I just set individual components as top level and re-synth. It's a pita to get the synth results that way.
Quote:
Over on 6502.org we once compared a number of 6502 cores, and there was quite a large variation in size - all of them, of course, are meeting the same spec. In fact, a 6x variation.

It shows that it's possible to create different code for the same spec. with different results. Some optimized for size and others for clock rate.

Expanded the number of queue entries to 10, but kept issue logic for only the first eight slots. It’s extremely unlikely that instructions would be able to issue in later slots because they typically depend on instructions in prior slots. It’s a case of diminishing returns on the value of being able to issue from later slots. There is also instruction decoding to do which effectively occupies slots preventing issue.
Expanding the queue by two entries gained about 10% in performance. (7552 instructions got executed versus 6911).
Added a control bit to disable write merging for I/O devices. This is as opposed to having I/O instructions. It might be more efficient to have I/O instructions. There's already volatile loads in the design adding store for I/O wouldn't be a big leap.

_________________
Robert Finch http://www.finitron.ca


Fri Sep 21, 2018 2:55 am
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1780
You're right, of course, that there are often tradeoffs of size vs speed: adders and multipliers are obvious examples. But I don't think that's what's going on with the 6502 results. I think it's mostly about coding styles and microarchitectural decisions. I think Arlet's core is both smallest and fastest. (One thing Arlet's core isn't, is very flexible, unlike say a microcoded machine might be. It's built very closely to the way the NMOS 6502 was built. As a possibly consequence, it was gratifyingly easy and low-cost to extend it to 65C02.)

I wouldn't want to quibble - I just wanted to note the effect of microarchitecture on synthesis results. When a design seems too large or too slow, it might be possible to do a round of microarchitectural review, and see where the costs are, and what might be done. Clearly, it won't always be possible to improve things.


Fri Sep 21, 2018 4:26 am
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Rob:

Don't know what particular tool you're using, Vivado or ISE, but under ISE I am able to get a fairly accurate resource utilization report for the module hierarchy. I have to have two sets of process options set. Under the synthesis Process Properties, in the Synthesis Options tab, I set -keep_hierarchy to NO and I set -netlist_hierarchy to Rebuilt. Under the Implement Design Process Properties - MAP Options tab, I check the -detail (Detailed MAP Report) option.

With these settings, I can synthesis and MAP/PAR the design, or just MAP the design, the Module Level Utilization Report provides a detailed resource utilization report. In my experience, the module resource utilization values reported make sense when used with these settings.

_________________
Michael A.


Fri Sep 21, 2018 11:45 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
I searched around briefly in the options and I can't see any that set the map detail. I'm using Vivado 18.2. There probably is an option, as there is a "more options" field to allow you to specify. IIRC I've seen a breakdown of resource usage, just not by hierarchical module. Vivado has tons of reporting so it must be in there somewhere.

I started working again on an Audio/Video core (AVIC128). It has some accelerators for text and graphics blitting, point plot, line and curve drawing. This was a 16-bit core that used block ram for display ram, modified to be a 64-bit core using ddr ram. So there's quite a few changes to it.

_________________
Robert Finch http://www.finitron.ca


Sat Sep 22, 2018 7:09 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Hey, I just found a spot to be able to specify a hierarchical report.
You have to click on reporting strategies, then specify custom reports.
I'm re-running synth now out of curiosity.

_________________
Robert Finch http://www.finitron.ca


Sat Sep 22, 2018 7:18 am
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Cool. Looking forward to seeing your report.

_________________
Michael A.


Sat Sep 22, 2018 3:44 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added macro support to the assembler. A macro can be defined like the following example:
Code:
 macro mGfxCmd (cmd, dat)
      lh      r3,dat
      ldi      r5,#cmd<<32   
      or      r3,r3,r5
      sw      r3,$DC0[r6]
      memdb
      sw      r0,$DD0[r6]
      memdb
      bra      .testbr@
      dc      0x1234
.testbr@
endm

The ‘@’ symbol substitutes a macro instance number for the ‘@’. So,
Code:
       memdb
      bgtu   r2,r4,.0001         ; allow up 24 entries to be in progress   
      
      mGfxCmd (12,fgcolor)
      mGfxCmd (13,bkcolor)
      mGfxCmd (16,_DBGCursorCol)
   

expands into:
Code:
FFFFFFFFFFFC0504 02 00 40 04                          memdb
FFFFFFFFFFFC0508 30 44 50 FF                          bgtu   r2,r4,.0001         ; allow up 24 entries to be in progress   
                                 
                                 
FFFFFFFFFFFC050C 60 60 08 00 00 FD                    lh      r3,fgcolor
FFFFFFFFFFFC0512 27 05 06 00 49 A0 00 00              ldi      r5,#12<<32   
FFFFFFFFFFFC051A 00 00                       
FFFFFFFFFFFC051C 02 A3 8C 25                          or      r3,r3,r5
      sw      r3,$DC0[r6]
FFFFFFFFFFFC0520 02 00 40 04                          memdb
FFFFFFFFFFFC0524 24 06 50 37                          sw      r0,$DD0[r6]
FFFFFFFFFFFC0528 02 00 40 04                          memdb
FFFFFFFFFFFC052C 81 70                                bra      .testbr0
FFFFFFFFFFFC052E 34 12                                dc      0x1234
                           .testbr0
                           
                                 
FFFFFFFFFFFC0530 60 60 18 00 00 FD                    lh      r3,bkcolor
FFFFFFFFFFFC0536 27 85 06 00 49 A0 00 00              ldi      r5,#13<<32   
FFFFFFFFFFFC053E 00 00                       <etc…>


I hope to eventually have the assembler support structured programming constructs too. The ‘if’, ‘ifdef’ and ‘ifndef’ statements are already supported.

It took 10 hours to route the entire SoC, 17,000 routes failed. The device is just a little bit too full when I include the audio / video circuitry. So I’ll have to par the system down some.

_________________
Robert Finch http://www.finitron.ca


Sun Sep 23, 2018 5:52 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Here is a copy of the utilization report. I can trim the cpu size down some.
Code:
1. Utilization by Hierarchy
---------------------------

+------------------------------------+------------------------------------+----------------+----------------+---------------+----------+---------------+-----------+-----------+--------------+
|              Instance              |               Module               |   Total LUTs   |   Logic LUTs   |    LUTRAMs    |   SRLs   |      FFs      |   RAMB36  |   RAMB18  | DSP48 Blocks |
+------------------------------------+------------------------------------+----------------+----------------+---------------+----------+---------------+-----------+-----------+--------------+
| FT64v5SoC                          |                              (top) | 120081(89.21%) | 100547(74.70%) | 19528(42.27%) | 6(0.01%) | 27049(10.05%) | 28(7.67%) | 18(2.47%) |    21(2.84%) |
|   (FT64v5SoC)                      |                              (top) |       1(0.01%) |       1(0.01%) |      0(0.00%) | 0(0.00%) |      3(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   uavic1                           |                            AVIC128 |   10791(8.02%) |   10788(8.01%) |      0(0.00%) | 3(0.01%) |  12221(4.54%) |  0(0.00%) |  1(0.14%) |    19(2.57%) |
|     (uavic1)                       |                            AVIC128 |    4788(3.56%) |    4785(3.55%) |      0(0.00%) | 3(0.01%) |  12143(4.51%) |  0(0.00%) |  1(0.14%) |    19(2.57%) |
|     ed10                           |                        edge_det_58 |       2(0.01%) |       2(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ed2                            |                        edge_det_59 |       3(0.01%) |       3(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ed5                            |                        edge_det_60 |       1(0.01%) |       1(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ed6                            |                        edge_det_61 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ed7                            |                        edge_det_62 |       2(0.01%) |       2(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u1                             |                         VGASyncGen |    5992(4.45%) |    5992(4.45%) |      0(0.00%) | 0(0.00%) |     71(0.03%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       (u1)                         |                         VGASyncGen |      47(0.03%) |      47(0.03%) |      0(0.00%) | 0(0.00%) |      5(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       u1                           |                            counter |    4788(3.56%) |    4788(3.56%) |      0(0.00%) | 0(0.00%) |     45(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       u2                           |                         counter_65 |    1157(0.86%) |    1157(0.86%) |      0(0.00%) | 0(0.00%) |     21(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u2                             |                  AVIC128_VideoFifo |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u3                             |                        edge_det_63 |       2(0.01%) |       2(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u4                             |                   VIC128_ShadowRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u5                             |                   VIC128_AudioFifo |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u6                             |                   VIC128_AudioFifo |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u7                             |                   VIC128_AudioFifo |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     u8                             |                   VIC128_AudioFifo |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ucf1                           |                    AVIC128_CmdFifo |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ued20                          |                        edge_det_64 |       1(0.01%) |       1(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   ubr1                             |                            bootrom |       5(0.01%) |       5(0.01%) |      0(0.00%) | 0(0.00%) |     19(0.01%) | 16(4.38%) |  0(0.00%) |     0(0.00%) |
|     (ubr1)                         |                            bootrom |       4(0.01%) |       4(0.01%) |      0(0.00%) | 0(0.00%) |     18(0.01%) | 16(4.38%) |  0(0.00%) |     0(0.00%) |
|     u1                             |                        edge_det_57 |       1(0.01%) |       1(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   ucg1                             |                   NexysVideoClkgen |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   ucpu1                            |                           FT64_mpu | 108729(80.78%) |  89286(66.33%) | 19440(42.08%) | 3(0.01%) |  13513(5.02%) |  8(2.19%) | 15(2.05%) |     0(0.00%) |
|     ucpu1                          |                               FT64 |  90062(66.91%) |  87027(64.66%) |   3032(6.56%) | 3(0.01%) |  12912(4.80%) |  8(2.19%) | 15(2.05%) |     0(0.00%) |
|       (ucpu1)                      |                               FT64 |  50870(37.79%) |  50870(37.79%) |      0(0.00%) | 0(0.00%) |   9371(3.48%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gAluInst.ualu1               |                           FT64_alu |     168(0.12%) |     168(0.12%) |      0(0.00%) | 0(0.00%) |    176(0.07%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         udiv1                      |                    FT64_divider_41 |      15(0.01%) |      15(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umult1                     |                 FT64_multiplier_42 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb0                    | FT64_multiplier__parameterized2_43 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb1                    | FT64_multiplier__parameterized2_44 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb2                    | FT64_multiplier__parameterized2_45 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb3                    | FT64_multiplier__parameterized2_46 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb4                    | FT64_multiplier__parameterized2_47 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb5                    | FT64_multiplier__parameterized2_48 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb6                    | FT64_multiplier__parameterized2_49 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb7                    | FT64_multiplier__parameterized2_50 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc0                    | FT64_multiplier__parameterized1_51 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc1                    | FT64_multiplier__parameterized1_52 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc2                    | FT64_multiplier__parameterized1_53 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc3                    | FT64_multiplier__parameterized1_54 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umulth0                    | FT64_multiplier__parameterized0_55 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umulth1                    | FT64_multiplier__parameterized0_56 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDCacheInst.udc1             |             FT64_dcache__xdcDup__1 |      13(0.01%) |      13(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  1(0.14%) |     0(0.00%) |
|         u1                         |                      dcache_mem_40 |       1(0.01%) |       1(0.01%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  1(0.14%) |     0(0.00%) |
|         u3                         |         FT64_dcache_tag__xdcDup__1 |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           (u3)                     |         FT64_dcache_tag__xdcDup__1 |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u1                       |                   FT64_dcache_tag2 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u2                       |                   FT64_dcache_tag2 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[0].iq0 |                           decoder8 |     190(0.14%) |     190(0.14%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[1].iq0 |                         decoder8_6 |     183(0.14%) |     183(0.14%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[2].iq0 |                         decoder8_7 |     162(0.12%) |     162(0.12%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[3].iq0 |                         decoder8_8 |     183(0.14%) |     183(0.14%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[4].iq0 |                         decoder8_9 |     198(0.15%) |     198(0.15%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[5].iq0 |                        decoder8_10 |     194(0.14%) |     194(0.14%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[6].iq0 |                        decoder8_11 |     182(0.14%) |     182(0.14%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[7].iq0 |                        decoder8_12 |     206(0.15%) |     206(0.15%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[8].iq0 |                        decoder8_13 |      26(0.02%) |      26(0.02%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gDecocderInst.genblk1[9].iq0 |                        decoder8_14 |      26(0.02%) |      26(0.02%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gFPUInst.ufp1                |                             fpUnit |      30(0.02%) |      30(0.02%) |      0(0.00%) | 0(0.00%) |     31(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         fpr0                       |                      fpRoundReg_35 |       2(0.01%) |       2(0.01%) |      0(0.00%) | 0(0.00%) |      8(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         u11                        |                           fpDiv_36 |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |     12(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u2                       |                        fpdivr16_39 |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |     12(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         u13                        |                          fpSqrt_37 |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u2                       |                           isqrt_38 |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gFPUInst.ufp1__0             |                          fpUnit_15 |      30(0.02%) |      30(0.02%) |      0(0.00%) | 0(0.00%) |     31(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         fpr0                       |                         fpRoundReg |       2(0.01%) |       2(0.01%) |      0(0.00%) | 0(0.00%) |      8(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         u11                        |                              fpDiv |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |     12(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u2                       |                           fpdivr16 |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |     12(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         u13                        |                             fpSqrt |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u2                       |                              isqrt |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gICacheInst.uic1             |                     FT64_L1_icache |    1984(1.47%) |     507(0.38%) |   1476(3.19%) | 1(0.01%) |     83(0.03%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         (gICacheInst.uic1)         |                     FT64_L1_icache |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         tags.u1                    |              FT64_L1_icache_mem_32 |     759(0.56%) |     471(0.35%) |    288(0.62%) | 0(0.00%) |     64(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         tags.u3                    |       FT64_L1_icache_cmptag4way_33 |    1209(0.90%) |      20(0.01%) |   1188(2.57%) | 1(0.01%) |     19(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           (tags.u3)                |       FT64_L1_icache_cmptag4way_33 |    1194(0.89%) |       6(0.01%) |   1188(2.57%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u1                       |                            lfsr_34 |      15(0.01%) |      14(0.01%) |      0(0.00%) | 1(0.01%) |     19(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       gIDUInst.uid2                |                      FT64_idecoder |     454(0.34%) |     454(0.34%) |      0(0.00%) | 0(0.00%) |     46(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       ualu0                        |           FT64_alu__parameterized0 |     173(0.13%) |     173(0.13%) |      0(0.00%) | 0(0.00%) |    176(0.07%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         udiv1                      |                       FT64_divider |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umult1                     |                    FT64_multiplier |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb0                    |    FT64_multiplier__parameterized2 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb1                    | FT64_multiplier__parameterized2_21 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb2                    | FT64_multiplier__parameterized2_22 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb3                    | FT64_multiplier__parameterized2_23 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb4                    | FT64_multiplier__parameterized2_24 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb5                    | FT64_multiplier__parameterized2_25 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb6                    | FT64_multiplier__parameterized2_26 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultb7                    | FT64_multiplier__parameterized2_27 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc0                    |    FT64_multiplier__parameterized1 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc1                    | FT64_multiplier__parameterized1_28 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc2                    | FT64_multiplier__parameterized1_29 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umultc3                    | FT64_multiplier__parameterized1_30 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umulth0                    |    FT64_multiplier__parameterized0 |      10(0.01%) |      10(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         umulth1                    | FT64_multiplier__parameterized0_31 |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |     11(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       ubp1                         |               FT64_BranchPredictor |     573(0.43%) |     493(0.37%) |     80(0.17%) | 0(0.00%) |    242(0.09%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       ubtb1                        |                           FT64_BTB |     508(0.38%) |     508(0.38%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  8(2.19%) |  0(0.00%) |     0(0.00%) |
|       udc0                         |                        FT64_dcache |      12(0.01%) |      12(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  1(0.14%) |     0(0.00%) |
|         u1                         |                         dcache_mem |       1(0.01%) |       1(0.01%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  1(0.14%) |     0(0.00%) |
|         u3                         |                    FT64_dcache_tag |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           (u3)                     |                    FT64_dcache_tag |      11(0.01%) |      11(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u1                       |                   FT64_dcache_tag2 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u2                       |                   FT64_dcache_tag2 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       ufb1                         |                      FT64_fetchbuf |  28013(20.81%) |  28013(20.81%) |      0(0.00%) | 0(0.00%) |   1571(0.58%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         (ufb1)                     |                      FT64_fetchbuf |  26215(19.48%) |  26215(19.48%) |      0(0.00%) | 0(0.00%) |    392(0.15%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         ursb1                      |                           FT64_RSB |    1470(1.09%) |    1470(1.09%) |      0(0.00%) | 0(0.00%) |    615(0.23%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         ursb2                      |                        FT64_RSB_20 |     328(0.24%) |     328(0.24%) |      0(0.00%) | 0(0.00%) |    564(0.21%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       uic0                         |                  FT64_L1_icache_16 |    3999(2.97%) |    2522(1.87%) |   1476(3.19%) | 1(0.01%) |    379(0.14%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         (uic0)                     |                  FT64_L1_icache_16 |      16(0.01%) |      16(0.01%) |      0(0.00%) | 0(0.00%) |    291(0.11%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         tags.u1                    |                 FT64_L1_icache_mem |    2777(2.06%) |    2489(1.85%) |    288(0.62%) | 0(0.00%) |     64(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         tags.u3                    |          FT64_L1_icache_cmptag4way |    1206(0.90%) |      17(0.01%) |   1188(2.57%) | 1(0.01%) |     24(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           (tags.u3)                |          FT64_L1_icache_cmptag4way |    1191(0.88%) |       3(0.01%) |   1188(2.57%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|           u1                       |                            lfsr_19 |      15(0.01%) |      14(0.01%) |      0(0.00%) | 1(0.01%) |     24(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       uic2                         |                     FT64_L2_icache |    1257(0.93%) |    1256(0.93%) |      0(0.00%) | 1(0.01%) |    717(0.27%) |  0(0.00%) | 13(1.78%) |     0(0.00%) |
|         (uic2)                     |                     FT64_L2_icache |     131(0.10%) |     131(0.10%) |      0(0.00%) | 0(0.00%) |    136(0.05%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         tags.u2                    |          FT64_L2_icache_cmptag4way |     803(0.60%) |     802(0.60%) |      0(0.00%) | 1(0.01%) |     67(0.02%) |  0(0.00%) |  4(0.55%) |     0(0.00%) |
|           (tags.u2)                |          FT64_L2_icache_cmptag4way |     796(0.59%) |     796(0.59%) |      0(0.00%) | 0(0.00%) |     43(0.02%) |  0(0.00%) |  4(0.55%) |     0(0.00%) |
|           u1                       |                               lfsr |       7(0.01%) |       6(0.01%) |      0(0.00%) | 1(0.01%) |     24(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         u1                         |                 FT64_L2_icache_mem |     320(0.24%) |     320(0.24%) |      0(0.00%) | 0(0.00%) |    513(0.19%) |  0(0.00%) |  9(1.23%) |     0(0.00%) |
|         u3                         |                        edge_det_18 |       3(0.01%) |       3(0.01%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       uid1                         |                   FT64_idecoder_17 |     226(0.17%) |     226(0.17%) |      0(0.00%) | 0(0.00%) |     46(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       urf1                         |                FT64_regfile2w6r_oc |     162(0.12%) |     162(0.12%) |      0(0.00%) | 0(0.00%) |     41(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         (urf1)                     |                FT64_regfile2w6r_oc |     162(0.12%) |     162(0.12%) |      0(0.00%) | 0(0.00%) |     41(0.02%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         urf10                      |                    FT64_regfileRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         urf11                      |                    FT64_regfileRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         urf12                      |                    FT64_regfileRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         urf13                      |                    FT64_regfileRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         urf14                      |                    FT64_regfileRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|         urf15                      |                    FT64_regfileRam |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       ustmp1                       |                         FT64_stomp |      40(0.03%) |      40(0.03%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ucrd1                          |                         CardMemory |  17788(13.22%) |    1404(1.04%) | 16384(35.46%) | 0(0.00%) |    237(0.09%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     ummu1                          |                           FT64_mmu |     474(0.35%) |     474(0.35%) |      0(0.00%) | 0(0.00%) |    104(0.04%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       (ummu1)                      |                           FT64_mmu |     474(0.35%) |     474(0.35%) |      0(0.00%) | 0(0.00%) |    104(0.04%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       u1                           |                       FT64_MMURam1 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     upic1                          |                           FT64_pic |      80(0.06%) |      56(0.04%) |     24(0.05%) | 0(0.00%) |    108(0.04%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     upit1                          |                           FT64_pit |     325(0.24%) |     325(0.24%) |      0(0.00%) | 0(0.00%) |    152(0.06%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   umc1                             |                              mpmc6 |     434(0.32%) |     346(0.26%) |     88(0.19%) | 0(0.00%) |   1059(0.39%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     (umc1)                         |                              mpmc6 |     434(0.32%) |     346(0.26%) |     88(0.19%) | 0(0.00%) |   1059(0.39%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     uddr3                          |                      mig_7series_1 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   uprg1                            |                             random |      21(0.02%) |      21(0.02%) |      0(0.00%) | 0(0.00%) |    100(0.04%) |  2(0.55%) |  0(0.00%) |     2(0.27%) |
|     (uprg1)                        |                             random |       2(0.01%) |       2(0.01%) |      0(0.00%) | 0(0.00%) |    100(0.04%) |  0(0.00%) |  0(0.00%) |     2(0.27%) |
|     u1                             |                           rand_ram |       4(0.01%) |       4(0.01%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  1(0.27%) |  0(0.00%) |     0(0.00%) |
|     u2                             |                         rand_ram_5 |      15(0.01%) |      15(0.01%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  1(0.27%) |  0(0.00%) |     0(0.00%) |
|   ur2d1                            |                            rgb2dvi |      97(0.07%) |      97(0.07%) |      0(0.00%) | 0(0.00%) |    104(0.04%) |  0(0.00%) |  2(0.27%) |     0(0.00%) |
|     ClockSerializer                |                       OutputSERDES |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     DataEncoders[0].DataEncoder    |                       TMDS_Encoder |      36(0.03%) |      36(0.03%) |      0(0.00%) | 0(0.00%) |     38(0.01%) |  0(0.00%) |  1(0.14%) |     0(0.00%) |
|     DataEncoders[0].DataSerializer |                     OutputSERDES_0 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     DataEncoders[1].DataEncoder    |                     TMDS_Encoder_1 |      30(0.02%) |      30(0.02%) |      0(0.00%) | 0(0.00%) |     32(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     DataEncoders[1].DataSerializer |                     OutputSERDES_2 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     DataEncoders[2].DataEncoder    |                     TMDS_Encoder_3 |      31(0.02%) |      31(0.02%) |      0(0.00%) | 0(0.00%) |     32(0.01%) |  0(0.00%) |  1(0.14%) |     0(0.00%) |
|     DataEncoders[2].DataSerializer |                     OutputSERDES_4 |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      0(0.00%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|     LockLostReset                  |                        ResetBridge |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      2(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|       SyncAsyncx                   |                          SyncAsync |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      2(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
|   uscr1                            |                         scratchmem |       3(0.01%) |       3(0.01%) |      0(0.00%) | 0(0.00%) |     30(0.01%) |  2(0.55%) |  0(0.00%) |     0(0.00%) |
|     (uscr1)                        |                         scratchmem |       3(0.01%) |       3(0.01%) |      0(0.00%) | 0(0.00%) |     29(0.01%) |  2(0.55%) |  0(0.00%) |     0(0.00%) |
|     u1                             |                           edge_det |       0(0.00%) |       0(0.00%) |      0(0.00%) | 0(0.00%) |      1(0.01%) |  0(0.00%) |  0(0.00%) |     0(0.00%) |
+------------------------------------+------------------------------------+----------------+----------------+---------------+----------+---------------+-----------+-----------+--------------+

_________________
Robert Finch http://www.finitron.ca


Sun Sep 23, 2018 5:56 am
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 213
Location: Huntsville, AL
Results. What do you think about the results for the various modules. I use these reports to give me a sense of where to optimize, and where I've left off some connections. Parsing the tools' warnings can be mind numbing and extremely error prone.

_________________
Michael A.


Sun Sep 23, 2018 3:18 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The breakdown doesn’t show all the modules, perhaps there is an additional option to set.
1) The fetch buffers are quite a bit larger (27000 LUTs) than I would’ve thought. They occupy about 1/3 of the cpu. I’m guessing it’s the 4x decoding of flow control instructions. It could also be the compressed instruction expanders. Results for those two modules are not shown on the report. There needs to be a decode for each fetch buffer to get single cycle performance on branches. Given the fact that the design doesn’t meet timing, it may be better to rethink the design. Branching could be put off a couple of cycles until after the regular decode, that would eliminate the 4x decode. That would cause branches to be multi-cycle. But hopefully the fmax could be driven up.
2) The Card memory which is present to accelerate garbage collection could be eliminated. It’s 17,000 LUTs. It was made of LUTs to get single cycle performance. I suppose it could be switched to block-ram with it’s two cycle latency.
3) The L1-icache memory (made of LUT’s) could maybe be larger, it’s only 3% (2kB) of the design. A lot of performance might be gained from a larger L1-icache.
4) The ALU isn’t nearly as big as I thought it would be -> could make all the ALU’s the same (BIG) and increase the number of them.
5) The fpu isn’t taking hardly anything, either the stat’s wrong or it was configured without an fpu.
6) There’s a lot LUT rams in the mpu and there shouldn’t be.

I’m not too thrilled about decoding on the fetch buffers. I’ve been wondering if it’s possible to add another layer of registers between the fetch and queue stages to help with the fmax. Right now, a lot of work is being done on the (unregistered) output of the fetch stage (register decode / fetch, and instruction dependency logic). I like the idea of moving logic to the other side of the queue because then it can be pipelined. But I don't know how it would be done.

Experimenting with dynamic compressed instructions. They seem to work. It takes only a single opcode to allow the top 256 instructions to be encoded into a single 16-bit instruction parcel. Having an additional lookup table in the fetch path probably isn’t a good idea for performance. The dynamic compressed instructions are a second set of compressed instructions. The first set is a fixed set of compressed instructions.

_________________
Robert Finch http://www.finitron.ca


Mon Sep 24, 2018 3:28 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 15, 16, 17, 18, 19, 20, 21 ... 52  Next

Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software