robfinch wrote:

This version of Tiny Basic is port from DSD9 which wasn’t completely debugged. I now have several versions of Tiny Basic, but they are all ultimately derived from Tiny Basic for the 68000. Tiny Basic for the 68k was a port of Palo Alto Tiny Basic found in Dr. Dobb’s Journal. Of course, the source code is considerably different now but the ideas remain the same.

Thanks!

Statistics: Posted by BigEd — Wed Jan 29, 2020 3:28 pm

]]>

Added several custom instructions to the core to support operating system functions. One function is implementing a bitmap of allocated memory pages. Searching and updating the page allocation map is time consuming and requires a couple of hundred lines of code. It’s a lot of bit fiddling which is hard to do without bitmap instructions. So, I turned it into a hardware function which is faster. It reduced the s lines of code to a single instruction.

Alloc:

sub $sp,$sp,#16

sw $ra,[$sp]

sw $s1,4[$sp] ; these regs must be saved

sw $s2,8[$sp]

sw $s3,12[$sp]

; First check if there are enough pages available in the system.

add $v0,$a0,#2047 ; v0 = round memory request

srl $v0,$v0,#11 ; v0 = convert to pages required

lw $t0,NPAGES ; check number of pages available

bleu $v0,$t0,.enough

mov $v0,$x0 ; not enough, return null

bra .noRun

.enough:

; There are enough pages, but is there a run long enough in map space?

sw $s2,$v0 ; save required # pages

mov $a0,$v0

call FindRun ; find a run of available slots

beq $v0,$x0,.noRun

; Now there are enough pages, and a run available, so allocate

mov $s1,$v0 ; s1 = start of run

lw $s3,NPAGES ; decrease number of pages available in system

sub $s3,$s3,$s2

sw $s3,NPAGES

mov $s3,$v0 ; s3 = start of run

.0001:

palloc $v0 ; allocate a page (cheat and use hardware)

beq $v0,$x0,.noRun

mvmap $x0,$v0,$s3 ; map the page

add $s3,$s3,#1 ; next bucket

sub $s2,$s2,#1

bne $s2,$x0,.0001

sll $v0,$s1,#11 ; v0 = virtual address of allocated mem.

.noRun:

lw $ra,[$sp] ; restore saved regs

lw s1,4[$sp]

lw s2,8[$sp]

lw s3,12[$sp]

add $sp,$sp,#16

ret

sub $sp,$sp,#16

sw $ra,[$sp]

sw $s1,4[$sp] ; these regs must be saved

sw $s2,8[$sp]

sw $s3,12[$sp]

; First check if there are enough pages available in the system.

add $v0,$a0,#2047 ; v0 = round memory request

srl $v0,$v0,#11 ; v0 = convert to pages required

lw $t0,NPAGES ; check number of pages available

bleu $v0,$t0,.enough

mov $v0,$x0 ; not enough, return null

bra .noRun

.enough:

; There are enough pages, but is there a run long enough in map space?

sw $s2,$v0 ; save required # pages

mov $a0,$v0

call FindRun ; find a run of available slots

beq $v0,$x0,.noRun

; Now there are enough pages, and a run available, so allocate

mov $s1,$v0 ; s1 = start of run

lw $s3,NPAGES ; decrease number of pages available in system

sub $s3,$s3,$s2

sw $s3,NPAGES

mov $s3,$v0 ; s3 = start of run

.0001:

palloc $v0 ; allocate a page (cheat and use hardware)

beq $v0,$x0,.noRun

mvmap $x0,$v0,$s3 ; map the page

add $s3,$s3,#1 ; next bucket

sub $s2,$s2,#1

bne $s2,$x0,.0001

sll $v0,$s1,#11 ; v0 = virtual address of allocated mem.

.noRun:

lw $ra,[$sp] ; restore saved regs

lw s1,4[$sp]

lw s2,8[$sp]

lw s3,12[$sp]

add $sp,$sp,#16

ret

The FMTK start task operation wasn’t working correctly, but I managed to trace it to bad branch operations. On a hunch that it might be timing related, I inserted an extra stage after register fetch to give more time for register data to appear. And voila, it works now. The tools didn’t report any timing errors, but it seems like there is one. It’s running a fairly fast clock 50MHz and I’m not sure how good the PS is on the little circuit board.

Statistics: Posted by robfinch — Wed Jan 29, 2020 4:01 am

]]>

]]>

I understand your dislike of interpreted languages, I think... They just don't work efficient.

Statistics: Posted by roelh — Tue Jan 28, 2020 9:45 pm

]]>

roelh wrote:

Thanks Joan ! What is so amazing about running interpreted Python on the RPi ?

Well, it's just that any low spec avr based card would do it.

(and that I generally have a greater appreciation for "compiled" languages, but that's just me, of course )

Statistics: Posted by joanlluch — Tue Jan 28, 2020 9:37 pm

]]>

]]>

Statistics: Posted by joanlluch — Tue Jan 28, 2020 9:05 pm

]]>

see: https://hackaday.io/project/167605-kobold-k2-risc-ttl-computer/log/173420-milestone-hello-world

Statistics: Posted by roelh — Tue Jan 28, 2020 8:40 pm

]]>

]]>

robfinch wrote:

I think this project is awesome.

Just to confirm I understand thing: the divider is working one digit at a time as opposed to one bit at a time? And the rr4/rl4, etc are four bit shifts?

Hi Rob, thanks for that.

Yes, you understood it well. The divider works one decimal digit at a time. It's not that different than paper and pencil division that we were all taught in school. The goal of this computer is implementing a decimal floating point calculator, and be able to run a few simple arithmetic algorithms, such as finding the fibonacci series or computing a few digits of pi. This is all what relay computers are able to do and what they did when no better technology was available.

The instruction set is tuned for decimal floating point arithmetic. Decimal digits are represented in BCD taking 4 bits per digit, that's why 4 bit shifts are used quite often (they mean multiply or divide by 10). I have not yet fully decided the entire format of the floating point number, but I know that the mantissa will have 8 digits (represented in 32 bits), and that the exponent will be base 10, possibly limited somehow between -99 and +99, like most scientific calculators. The instruction set allows for arithmetic on larger numbers, with say 12 or 16 digits mantissa or more, but execution time would exponentially increment, so I think that 8 digits is fair enough.

robfinch wrote:

There seems to be a fair bit of code for multiply / divide. How big is the system rom? Is it a standard eprom or diode logic?

I think the code looks longer than what it really is because loops are partially unrolled. For example, the multiply algorithm loops 4 times for a 8 digit multiplication, and same for the division.

The final implementation will be somewhat longer and slightly more expensive because it will have to take the floating point exponent into account. I also need to look at ways to fix precision loss that happens for some combinations of numbers. The division example above, for example, produces 02340385 as the result (mantissa). After the floating point number is normalised by incrementing the exponent, the mantissa will become 23403850, but the exact result should be 23403857, so there's obviously some precision loss in the last digit. At this time I'm unsure whether this is easy (or cheap) to fix.

About memory, well, this is a relay computer with concessions to current technology. One of the concessions is of course the use of modern pcb mount small relays, but I also allow diodes, and standard memory ics, so there's no worries at all about memory capacity. In fact, there's potentially 128 kbytes for program and 128kbytes for data. I will never use that much memory on this computer.

Statistics: Posted by joanlluch — Tue Jan 28, 2020 8:47 am

]]>

Just to confirm I understand thing: the divider is working one digit at a time as opposed to one bit at a time? And the rr4/rl4, etc are four bit shifts?

There seems to be a fair bit of code for multiply / divide. How big is the system rom? Is it a standard eprom or diode logic?

Statistics: Posted by robfinch — Tue Jan 28, 2020 4:52 am

]]>

TinyBasic01.png

Changed the use of CSR’s for mapping and segmentation registers to access via custom instructions. CSR’s aren’t really meant to support arrays of data for things like memory maps. The issue was the need to access the CSR indirectly via another register. This kind of access is not supported in RISCV. It’s the kind of thing better suited to custom instructions.

Ran into issues of core size in the small Artix7-15t FPGA.

Statistics: Posted by robfinch — Tue Jan 28, 2020 4:34 am

]]>

.text

.file "main.s"

# ---------------------------------------------

# main

# ---------------------------------------------

.globl main

main:

# Load a into register r2:r3 (dividend)

mov [&a], r2

mov [&a+1], r3

# Load b into register r4:r5 (divisor)

mov [&b], r4

mov [&b+1], r5

# r0:r1 will become the result

mov 0, r0

mov 0, r1 // R0:R1 will become the result

# We will loop 4 times

mov 4, r6

# but first compute first digit

.LDivFirst:

dsb r2, r4, r2

dsc r3, r5, r3 // Subtract divisor

adt 1, r0 // Only add result digit if no borrow (== carry) is produced

bt .LDivFirst // Subtract one more if no borrow

# For the first digit we shift divisor right (instead of dividend left)

rr4 r4, r5, r4 // lower first

sr4 r5, r5

b .LDivMore // branch to the restoring half of the loop

.LDLoop:

# Shift result one digit left

rl4 r1, r0, r1 // higher first

sl4 r0, r0

# Compute non restoring digit by repeatedly subtracting divisor

.LDivMinus:

dsb r2, r4, r2

dsc r3, r5, r3 // Subtract divisor

adt 1, r0 // Only add result digit if no borrow (== carry) is produced

bt .LDivMinus // Subtract one more if no borrow

# shift dividend left

rl4 r3, r2, r3 // higher first

sl4 r2, r2

# Shift result left, initialize last digit with 9

.LDivMore:

rl4 r1, r0, r1

sl4 r0, r0 // Shift result one digit left

add 9, r0 // Initialize result digit with 9 for restoring

# compute the restoring digit by repeatedly adding divisor

.LDivPlus:

dad r2, r4, r2

dac r3, r5, r3 // Add divisor

sbf 1, r0 // Only subtract if no carry

bf .LDivPlus // Add one more if no carry

# shift dividend left

rl4 r3, r2, r3 // higher first

sl4 r2, r2

# Next digit

sub 1, r6

bf .LDLoop

# store the result

mov r0, [&result]

mov r1, [&result+1]

hlt

# ---------------------------------------------

# Global Data

# ---------------------------------------------

.data

a: .long 0x18465932

b: .long 0x78901234

.comm result,4,2 // reserve space in data memory for result

.file "main.s"

# ---------------------------------------------

# main

# ---------------------------------------------

.globl main

main:

# Load a into register r2:r3 (dividend)

mov [&a], r2

mov [&a+1], r3

# Load b into register r4:r5 (divisor)

mov [&b], r4

mov [&b+1], r5

# r0:r1 will become the result

mov 0, r0

mov 0, r1 // R0:R1 will become the result

# We will loop 4 times

mov 4, r6

# but first compute first digit

.LDivFirst:

dsb r2, r4, r2

dsc r3, r5, r3 // Subtract divisor

adt 1, r0 // Only add result digit if no borrow (== carry) is produced

bt .LDivFirst // Subtract one more if no borrow

# For the first digit we shift divisor right (instead of dividend left)

rr4 r4, r5, r4 // lower first

sr4 r5, r5

b .LDivMore // branch to the restoring half of the loop

.LDLoop:

# Shift result one digit left

rl4 r1, r0, r1 // higher first

sl4 r0, r0

# Compute non restoring digit by repeatedly subtracting divisor

.LDivMinus:

dsb r2, r4, r2

dsc r3, r5, r3 // Subtract divisor

adt 1, r0 // Only add result digit if no borrow (== carry) is produced

bt .LDivMinus // Subtract one more if no borrow

# shift dividend left

rl4 r3, r2, r3 // higher first

sl4 r2, r2

# Shift result left, initialize last digit with 9

.LDivMore:

rl4 r1, r0, r1

sl4 r0, r0 // Shift result one digit left

add 9, r0 // Initialize result digit with 9 for restoring

# compute the restoring digit by repeatedly adding divisor

.LDivPlus:

dad r2, r4, r2

dac r3, r5, r3 // Add divisor

sbf 1, r0 // Only subtract if no carry

bf .LDivPlus // Add one more if no carry

# shift dividend left

rl4 r3, r2, r3 // higher first

sl4 r2, r2

# Next digit

sub 1, r6

bf .LDLoop

# store the result

mov r0, [&result]

mov r1, [&result+1]

hlt

# ---------------------------------------------

# Global Data

# ---------------------------------------------

.data

a: .long 0x18465932

b: .long 0x78901234

.comm result,4,2 // reserve space in data memory for result

The above divides 18465932 by 78901234 and gives 02340385 which is correct.

The algoritm is the "non restoring technique". It repeatedly subtracts the divisor from the divident (or remainder) until the remainder becomes negative. Then, it shifts the divisor right, and adds it to the remainder until it becomes positive again. This is repeated 4 times to complete the 8 digits. The number of subtractions/additions are accumulated on the Result, one digit at a time.

In the code above, it's interesting to note the use of "adt" (conditional add if true), "sbf" (conditional subtraction if false) to prevent the Result digit to accumulate for the addition or subtraction that caused the remainder sign change, as this would produce an incorrect result. Without this instruction, more code would be required and the algorithm would take longer to complete.

The procedure takes a variable number of cycles depending on the dividend and divisor inputs. On average, considering 5 iterations per digit, it takes 176 instruction cycles.

Statistics: Posted by joanlluch — Mon Jan 27, 2020 8:43 pm

]]>

Added a small system memory management unit (SSMMU). It has both segmentation and paging. Segmentation is via 16 segment registers selected with the high order 4-bits of an address like the PowerPC 32-bit machine. After a linear address is generated from segmentation a page map table is used lookup pages mapped into the address space. The system seems to work! Running the system without initializing the mmu resulted in the user address space being limited to 2kB. The size of a page. If not initialized all the pages are mapped to page zero. There’s only 512kB ram in the system. So it’s mapped using 256, 2kB pages. Since only 256 entries per address space are required, the entries are mapped into 64 consecutive CSR’s. The system allows for 16 different maps to be in use at the same time. It takes just one 4kB block ram to implement a mapping table so it’s pretty frugal.

Statistics: Posted by robfinch — Mon Jan 27, 2020 4:29 am

]]>

]]>