Last visit was: Wed Jan 22, 2025 5:40 am
|
It is currently Wed Jan 22, 2025 5:40 am
|
Author |
Message |
Garth
Joined: Tue Dec 11, 2012 8:03 am Posts: 285 Location: California
|
BigEd wrote: Garth often mentions that the 6502 stack is not as deeply used as one might suspect. I just ran a Basic timing benchmark, which exercises a lot of the Beeb's Basic interpreter, and was mildly surprised to see that it used as much as 136 bytes of stack. (That will be including IRQ handler usage, probably.) The applicable page in my 6502 stacks treatise is http://wilsonminesco.com/stacks/enoughstackspace.html which I think is the shortest page there. I might like to quote and link to your comment, especially if you add more information. Do you know what's taking so much stack space? It's interesting that BASIC needs so much more than Forth which exposes the stacks to the user.
_________________http://WilsonMinesCo.com/ lots of 6502 resources
|
Tue May 30, 2017 8:11 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
|
Tue May 30, 2017 8:41 pm |
|
|
Garth
Joined: Tue Dec 11, 2012 8:03 am Posts: 285 Location: California
|
A little optimizing needed there, eh? Yeah, 16 floating-point numbers, and maybe operators too, takes a lot of space.
_________________http://WilsonMinesCo.com/ lots of 6502 resources
|
Tue May 30, 2017 9:13 pm |
|
|
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
Updated the MiniCPU Architecture diagram above.
_________________ Michael A.
|
Wed May 31, 2017 12:48 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2233 Location: Canada
|
I added some information (seriously outdated though but the instruction set description should be okay) about the Butterfly processing core for the eight bit challenge. http://github.com/robfinch/Cores/blob/master/ButterflyThe core synthesized to 472 LUTs (118 slices) but when I ran imp on it, it reported 411 (103 slices). Just reading the old docs, it looks like it ran @40MHz. (Probably 50+ in a newer FPGA). The bus has been modified from 16 to eight bits to conform to the challenge.
_________________Robert Finch http://www.finitron.ca
|
Wed Jun 07, 2017 9:31 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
Looks like an interesting machine, Rob - 16 bits data and address, word oriented, 16 registers. Not totally unlike the OPC5 from a distance! (Although clearly we've arrived independently, with many differences in the details.) I've taken the liberty of twiddling my fork of your code to make the html doc directly available, at https://biged.github.io/Cores/butterfly.htmlThe document file can be viewed online using this URL.
|
Wed Jun 07, 2017 10:27 pm |
|
|
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
Very nice indeed.
_________________ Michael A.
|
Wed Jun 07, 2017 11:49 pm |
|
|
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
BigEd:
How many fibonacci numbers does the fibonacci test program you guys wrote for your OPC series compute? It looked like 23. Is that correct?
_________________ Michael A.
|
Fri Jun 09, 2017 1:32 am |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
Yes, it's 23. Here's the memory dump: Code: 0000 0001 0001 0002 0003 0005 0008 000d 0015 0022 0037 0059 0090 00e9 0179 0262 03db 063d 0a18 1055 1a6d 2ac2 452f 6ff1 b520 0000 0000 0000 0000 0000 0000 0000 The first two values are state variables for the program, then counting from 1,2,3 we write 23 values to memory, final value being 0xb520 (we stop there for 16 bit reasons.)
|
Fri Jun 09, 2017 3:20 am |
|
|
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
Thanks.
_________________ Michael A.
|
Fri Jun 09, 2017 10:14 am |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2233 Location: Canada
|
Finally got the Butterfly16 running in the simulator. To perform Fibonnaci out to 23 entries took 5880 ns or 588 clock cycles (measured in the simulator from the time of reset to the time of the last write to memory assuming single cycle memory). Code size is 34 bytes. Code: 119 0000F900 01 11 reset: 120 0000F902 01 12 lw r2,#1 121 0000F904 00 14 lw r4,#0 122 0000F906 j0001: 123 0000F906 00 13 lw r3,#0 124 0000F908 20 23 add r3,r2 125 0000F90A 10 23 add r3,r1 126 0000F90C 00 48 30 10 cmp r3,#$8000 127 0000F910 05 A7 bgtu endFibbonaci 128 0000F912 41 D1 sw r1,[r4] 129 0000F914 02 64 add r4,#2 130 0000F916 20 11 add r1,r2,#0 ; move r2 to r1 131 0000F918 30 12 add r2,r3,#0 ; move r3 to r2 132 0000F91A F5 BE bra j0001 133 0000F91C endFibbonaci: 134 0000F91C 41 D1 sw r1,[r4] 135 0000F91E 43 D2 sw r2,2[r4] 136 0000F920 45 D3 sw r3,4[r4] 137 0000F922 endFibbonaci2: 138 0000F922 FF BE bra endFibonnaci2
With fixes synth size is 489 LUTs or 123 slices.
_________________Robert Finch http://www.finitron.ca
|
Fri Jun 09, 2017 8:39 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
(Might be just worth noting: in our case we wanted also to test call and return, so we used a subroutine call for the addition. That increases the byte count and cycle count, both for the 6502 baseline and for the OPC version.)
|
Fri Jun 09, 2017 8:47 pm |
|
|
MichaelM
Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL
|
Well that was fun. Still need an assembler, but given the small number of opcodes, I was able to hand assemble and test my first Fibonacci series computation routine for the MiniCPU. From initialization to initialization is 810 cycles. Initialization of the workspace pointer, XP, the loop count (23), and the first two Fibonacci numbers (F0 = 0, and F1 = 1) is 24 cycles (17 instructions). Total number of instructions is 28 in the Fibonacci loop thus the CPI of program is approximately ((810 - 24) / 23) / 28 = 1.22 cycles per instruction. Pretty pleased with the overall result. The compact encoding used for instructions, the workspace relative storage, and the simple load/store architecture appears to be efficient enough for the calculation of the first 23 Fibonacci numbers (after initialization of the recurrence). In the routine tested, the loop count is held in location Mem[XP-1], and the new Fibonacci number if constructed in {Mem[XP+1], Mem[XP]} from the previous two Fibonacci numbers stored in F[n-1] = {Mem[XP+3], Mem[XP+2]} and F[n-2] = {Mem[XP+5], Mem[XP+4]}. After each number is calculated, space is allocated on the stack for the next value, and the loop count is updated and written to Mem[XP-1]. The MiniCPU does not have an unconditional branch, so at the end of the loop, a CLC instruction by a BNC xxxx instruction is used to loop back to the top of the loop. After this little exercise, I am thinking of changing the conditional branch instructions to: BNE, BGE (using N and V), and BNC. These conditional branches are instead of the presently implemented branches: BNE, BPL, and BNC. I am thinking that conditional branching on a zero test, BNE, and on the C flag will work for most programs. I am beginning to think that Alarm Siren may have a valid point regarding having one signed comparison instruction, or at least one, for a two's complement ALU. Below is a pseudo-listing of the MiniCPU Fibonacci series test program: Code: -- -- Fibonacci Series Generator Program for MiniCPU 8-bit CPU -- 0002: 0F # (1) _Start: pfx #F 0003: 30 # (1) ldc #0 0004: 21 # (1) xab 0005: 0F # (1) pfx #F -- ldc #-1 0006: 3F # (1) ldc #F 0007: 22 # (1) xch -- -- Initialize Fibonacci Series: F[0] = 0; F[1] = 1 -- 0008: 30 # (1) ldc #0 0009: 63 # (2) stl 3 000A: 62 # (2) stl 2 000B: 61 # (2) stl 1 000C: 31 # (1) ldc #1 000D: 60 # (2) stl 0 -- -- Initialize Loop Counter -- 000E: 01 # (1) pfx #1 -- ldc #23 000F: 37 # (1) ldc #7 0010: 10 # (1) nfx #0 -- stl -1 0011: 6F # (2) stl 15 -- -- Computation Loop: F[n] = F[n-1] + F[n-2] -- 0012: 10 # (1) _Loop: nfx #0 -- ldl -1 # 0013: 4F # (2) ldl 15 0014: A3 # (1) bne 3 -- bne _Cont # if(LpCntr <> 0) continue 0015: 24 # (1) clc 0016: 11 # (1) nfx #1 -- bnc _Start # Repeat Program 0017: CA # (1) bnc A -- -- Decrement Loop Count, allocate space on stack and save Loop Count -- 0018: 21 # (1) _Cont: xab -- Put current loop count in B 0019: 30 # (1) ldc #0 001A: 21 # (1) xab -- Put Loop Count in A and 0 in B 001B: 24 # (1) clc -- Decrement Loop Count 001C: 27 # (1) sbc 001D: 10 # (1) nfx #0 -- ldc #-2 # allocate space for new fibonacci number 001E: DE # (1) adj #14 001F: 10 # (1) nxf #0 -- stl #-1 # Store Loop Count below workspace pointer 0020: 6F # (2) stl 15 -- -- Calculate new Fibonacci number -- 0021: 44 # (2) ldl 4 -- load B with lo(F[n-2]) 0022: 21 # (1) xab 0023: 42 # (2) ldl 2 -- load A with lo(F[n-1]) 0024: 24 # (1) clc -- add lo(F[n-1]) + lo(F[n-2]) 0025: 26 # (1) adc 0026: 60 # (2) stl 0 -- save lo(F[n]) 0027: 45 # (2) ldl 5 -- load B with hi(F[n-2]) 0028: 21 # (1) xab 0029: 43 # (2) ldl 3 -- load A with hi(F[n-1]) 002A: 26 # (1) adc -- add hi(F[n-1]) + hi(F[n-2]) 002B: 61 # (2) stl 1 -- save hi(F[n]) -- -- End of Loop -- 002C: 24 # (1) clc 002D: 11 # (1) nfx #1 -- bnc _Loop 002E: C3 # (1) bnc 3
002F:
_________________ Michael A.
|
Sat Jun 10, 2017 1:12 pm |
|
|
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1808
|
It's been an interesting afternoon - we too have been reconsidering what branches (in our case, predicates) we have available. It looks like we might be adding a test on the sign bit shortly - right now, we support unsigned actions well but signed actions slightly less well.
It also turned out to be a bit inconvenient to have our flags change when we write to PC, so we've special-cased that not to update the flags. Generally, the flags are updated according to the value written to the reg file.
It goes to show that writing and rewriting a few bits of code sheds a lot of light on how convenient a machine is.
|
Sat Jun 10, 2017 1:53 pm |
|
|
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2233 Location: Canada
|
The Butterfly16's load and store byte instructions weren't working so I had to fix the core. Now the core just does a read of $FFFF for the second bus cycle when a byte operation is taking place. Also the core now inserts a dead cycle between every memory access. When placed in a real system, rom access got confused by the lack of a dead cycle and sent back an ack too soon. With fixes the core is now 499 LUTs, still within range for the challenge. The core is also being used to experiment wih on-chip networking.
_________________Robert Finch http://www.finitron.ca
|
Mon Jun 12, 2017 1:01 pm |
|
Who is online |
Users browsing this forum: CCBot and 1 guest |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|