View unanswered posts | View active topics It is currently Wed Sep 18, 2019 1:05 am



Reply to topic  [ 91 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
 8 bit CPU challenge 
Author Message

Joined: Tue Dec 11, 2012 8:03 am
Posts: 267
Location: California
BigEd wrote:
Garth often mentions that the 6502 stack is not as deeply used as one might suspect. I just ran a Basic timing benchmark, which exercises a lot of the Beeb's Basic interpreter, and was mildly surprised to see that it used as much as 136 bytes of stack. (That will be including IRQ handler usage, probably.)

The applicable page in my 6502 stacks treatise is http://wilsonminesco.com/stacks/enoughstackspace.html which I think is the shortest page there. I might like to quote and link to your comment, especially if you add more information. Do you know what's taking so much stack space?

It's interesting that BASIC needs so much more than Forth which exposes the stacks to the user.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Tue May 30, 2017 8:11 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
(I've put my response over here: http://forum.6502.org/viewtopic.php?f=2&t=4565)


Tue May 30, 2017 8:41 pm
Profile

Joined: Tue Dec 11, 2012 8:03 am
Posts: 267
Location: California
A little optimizing needed there, eh? :lol: :lol:

Yeah, 16 floating-point numbers, and maybe operators too, takes a lot of space.

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Tue May 30, 2017 9:13 pm
Profile WWW

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
Updated the MiniCPU Architecture diagram above.

_________________
Michael A.


Wed May 31, 2017 12:48 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 920
Location: Canada
I added some information (seriously outdated though but the instruction set description should be okay) about the Butterfly processing core for the eight bit challenge.

http://github.com/robfinch/Cores/blob/master/Butterfly

The core synthesized to 472 LUTs (118 slices) but when I ran imp on it, it reported 411 (103 slices). Just reading the old docs, it looks like it ran @40MHz. (Probably 50+ in a newer FPGA).

The bus has been modified from 16 to eight bits to conform to the challenge.

_________________
Robert Finch http://www.finitron.ca


Wed Jun 07, 2017 9:31 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
Looks like an interesting machine, Rob - 16 bits data and address, word oriented, 16 registers. Not totally unlike the OPC5 from a distance! (Although clearly we've arrived independently, with many differences in the details.) I've taken the liberty of twiddling my fork of your code to make the html doc directly available, at
https://biged.github.io/Cores/butterfly.html
The document file can be viewed online using this URL.


Wed Jun 07, 2017 10:27 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
Very nice indeed.

_________________
Michael A.


Wed Jun 07, 2017 11:49 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
BigEd:

How many fibonacci numbers does the fibonacci test program you guys wrote for your OPC series compute? It looked like 23. Is that correct?

_________________
Michael A.


Fri Jun 09, 2017 1:32 am
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
Yes, it's 23. Here's the memory dump:
Code:
 0000 0001 0001 0002 0003 0005 0008 000d
 0015 0022 0037 0059 0090 00e9 0179 0262
 03db 063d 0a18 1055 1a6d 2ac2 452f 6ff1
 b520 0000 0000 0000 0000 0000 0000 0000

The first two values are state variables for the program, then counting from 1,2,3 we write 23 values to memory, final value being 0xb520 (we stop there for 16 bit reasons.)


Fri Jun 09, 2017 3:20 am
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
Thanks.

_________________
Michael A.


Fri Jun 09, 2017 10:14 am
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 920
Location: Canada
Finally got the Butterfly16 running in the simulator.
To perform Fibonnaci out to 23 entries took 5880 ns or 588 clock cycles (measured in the simulator from the time of reset to the time of the last write to memory assuming single cycle memory).
Code size is 34 bytes.
Code:
    119 0000F900 01 11                  reset:   
    120 0000F902 01 12                         lw      r2,#1
    121 0000F904 00 14                         lw      r4,#0
    122 0000F906                        j0001:
    123 0000F906 00 13                         lw      r3,#0
    124 0000F908 20 23                         add      r3,r2
    125 0000F90A 10 23                         add      r3,r1
    126 0000F90C 00 48 30 10                   cmp      r3,#$8000
    127 0000F910 05 A7                         bgtu   endFibbonaci
    128 0000F912 41 D1                         sw      r1,[r4]
    129 0000F914 02 64                         add      r4,#2
    130 0000F916 20 11                         add      r1,r2,#0      ; move r2 to r1
    131 0000F918 30 12                         add      r2,r3,#0      ; move r3 to r2
    132 0000F91A F5 BE                         bra      j0001
    133 0000F91C                        endFibbonaci:
    134 0000F91C 41 D1                         sw      r1,[r4]
    135 0000F91E 43 D2                         sw      r2,2[r4]
    136 0000F920 45 D3                         sw      r3,4[r4]
    137 0000F922                        endFibbonaci2:
    138 0000F922 FF BE                         bra      endFibonnaci2

With fixes synth size is 489 LUTs or 123 slices.

_________________
Robert Finch http://www.finitron.ca


Fri Jun 09, 2017 8:39 pm
Profile WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
(Might be just worth noting: in our case we wanted also to test call and return, so we used a subroutine call for the addition. That increases the byte count and cycle count, both for the 6502 baseline and for the OPC version.)


Fri Jun 09, 2017 8:47 pm
Profile

Joined: Wed Apr 24, 2013 9:40 pm
Posts: 177
Location: Huntsville, AL
Well that was fun. Still need an assembler, but given the small number of opcodes, I was able to hand assemble and test my first Fibonacci series computation routine for the MiniCPU. From initialization to initialization is 810 cycles. Initialization of the workspace pointer, XP, the loop count (23), and the first two Fibonacci numbers (F0 = 0, and F1 = 1) is 24 cycles (17 instructions). Total number of instructions is 28 in the Fibonacci loop thus the CPI of program is approximately ((810 - 24) / 23) / 28 = 1.22 cycles per instruction.

Pretty pleased with the overall result. The compact encoding used for instructions, the workspace relative storage, and the simple load/store architecture appears to be efficient enough for the calculation of the first 23 Fibonacci numbers (after initialization of the recurrence).

In the routine tested, the loop count is held in location Mem[XP-1], and the new Fibonacci number if constructed in {Mem[XP+1], Mem[XP]} from the previous two Fibonacci numbers stored in F[n-1] = {Mem[XP+3], Mem[XP+2]} and F[n-2] = {Mem[XP+5], Mem[XP+4]}. After each number is calculated, space is allocated on the stack for the next value, and the loop count is updated and written to Mem[XP-1].

The MiniCPU does not have an unconditional branch, so at the end of the loop, a CLC instruction by a BNC xxxx instruction is used to loop back to the top of the loop.

After this little exercise, I am thinking of changing the conditional branch instructions to: BNE, BGE (using N and V), and BNC. These conditional branches are instead of the presently implemented branches: BNE, BPL, and BNC. I am thinking that conditional branching on a zero test, BNE, and on the C flag will work for most programs. I am beginning to think that Alarm Siren may have a valid point regarding having one signed comparison instruction, or at least one, for a two's complement ALU.

Below is a pseudo-listing of the MiniCPU Fibonacci series test program:
Code:
                    --
                    --  Fibonacci Series Generator Program for MiniCPU 8-bit CPU
                    --
0002: 0F    # (1)   _Start: pfx #F
0003: 30    # (1)           ldc #0
0004: 21    # (1)           xab
0005: 0F    # (1)           pfx #F      -- ldc #-1
0006: 3F    # (1)           ldc #F
0007: 22    # (1)           xch
                    --
                    --  Initialize Fibonacci Series: F[0] = 0; F[1] = 1
                    --
0008: 30    # (1)           ldc #0
0009: 63    # (2)           stl 3
000A: 62    # (2)           stl 2
000B: 61    # (2)           stl 1
000C: 31    # (1)           ldc #1
000D: 60    # (2)           stl 0
                    --
                    --  Initialize Loop Counter
                    --
000E: 01    # (1)           pfx #1      -- ldc #23
000F: 37    # (1)           ldc #7
0010: 10    # (1)           nfx #0      -- stl -1
0011: 6F    # (2)           stl 15
                    --
                    --  Computation Loop: F[n] = F[n-1] + F[n-2]
                    --
0012: 10    # (1)   _Loop:  nfx #0      -- ldl  -1      #
0013: 4F    # (2)           ldl 15
0014: A3    # (1)           bne 3       -- bne _Cont    # if(LpCntr <> 0) continue
0015: 24    # (1)           clc
0016: 11    # (1)           nfx #1      -- bnc _Start   # Repeat Program
0017: CA    # (1)           bnc A
                    --
                    --  Decrement Loop Count, allocate space on stack and save Loop Count
                    --
0018: 21    # (1)   _Cont:  xab         -- Put current loop count in B
0019: 30    # (1)           ldc #0
001A: 21    # (1)           xab         -- Put Loop Count in A and 0 in B
001B: 24    # (1)           clc         -- Decrement Loop Count
001C: 27    # (1)           sbc
001D: 10    # (1)           nfx #0      -- ldc #-2      # allocate space for new fibonacci number
001E: DE    # (1)           adj #14
001F: 10    # (1)           nxf #0      -- stl #-1      # Store Loop Count below workspace pointer
0020: 6F    # (2)           stl 15
                    --
                    --  Calculate new Fibonacci number
                    --
0021: 44    # (2)           ldl 4       -- load B with lo(F[n-2])
0022: 21    # (1)           xab
0023: 42    # (2)           ldl 2       -- load A with lo(F[n-1])
0024: 24    # (1)           clc         -- add lo(F[n-1]) + lo(F[n-2])
0025: 26    # (1)           adc
0026: 60    # (2)           stl 0       -- save lo(F[n])
0027: 45    # (2)           ldl 5       -- load B with hi(F[n-2])
0028: 21    # (1)           xab
0029: 43    # (2)           ldl 3       -- load A with hi(F[n-1])
002A: 26    # (1)           adc         -- add hi(F[n-1]) + hi(F[n-2])
002B: 61    # (2)           stl 1       -- save hi(F[n])
                   --
                    --  End of Loop
                    --
002C: 24    # (1)           clc
002D: 11    # (1)           nfx #1      -- bnc _Loop
002E: C3    # (1)           bnc 3

002F:

_________________
Michael A.


Sat Jun 10, 2017 1:12 pm
Profile

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1256
It's been an interesting afternoon - we too have been reconsidering what branches (in our case, predicates) we have available. It looks like we might be adding a test on the sign bit shortly - right now, we support unsigned actions well but signed actions slightly less well.

It also turned out to be a bit inconvenient to have our flags change when we write to PC, so we've special-cased that not to update the flags. Generally, the flags are updated according to the value written to the reg file.

It goes to show that writing and rewriting a few bits of code sheds a lot of light on how convenient a machine is.


Sat Jun 10, 2017 1:53 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 920
Location: Canada
The Butterfly16's load and store byte instructions weren't working so I had to fix the core. Now the core just does a read of $FFFF for the second bus cycle when a byte operation is taking place. Also the core now inserts a dead cycle between every memory access. When placed in a real system, rom access got confused by the lack of a dead cycle and sent back an ack too soon. With fixes the core is now 499 LUTs, still within range for the challenge. The core is also being used to experiment wih on-chip networking.

_________________
Robert Finch http://www.finitron.ca


Mon Jun 12, 2017 1:01 pm
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 91 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software