View unanswered posts | View active topics It is currently Tue Apr 23, 2024 10:09 pm



Reply to topic  [ 96 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
 LALU Computer: Lookup Arithmetic Logic Unit 
Author Message
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
For the LALU Computer I have recently been benchmarking a few methods to do 16 bit multiplication in my interpretive language LANG. I tested doing multiplication 3 different ways: 1) A simple addition loop written in LANG, 2) Defining multiplication as consecutive additions in assembly language (pointed to by the operator symbol "*"), and 3) Defining multiplication as bit shifts and additions, also pointed to as the operator symbol "*".

Here are the tests and the results:

16 bit multiplication of the numbers Hx0137 and Hx00D2 performed 1000X

Multiplication as a looped series of simple additions in LANG
#1 H03E8 VSA 0 VSB :A is loop index, B is running sum
#2 VRA 1 - VSA VRA IF GT3 ELSE QUIT
#3 H00D2 DO VRB H0137 + VSB LP :The actual multiplication
#4 0 VSB GT2 :Loop to do multiplication 1000 times
RESULT: 310 Seconds


Multiplication symbol "*" defined in assembly as looped consecutive adding
#1 H03E8 DO H0137 H00D2 * LP
#2 QUIT
RESULT: 23 Seconds

Multiplication symbol '*" defined in assembly as shifts and adds (See attached explanation)
#1 H03E8 DO H0137 H00D2 * LP
#2 QUIT
RESULT: 3 Seconds

We see that the shift and adding method is about 100X faster that simpled looped addition. Attached is a description of the shift and add method from a website about bit-based algos.

A couple ideas to make this faster would be to use lookup tables, or build some hardware to do it. Both interesting options...


Attachments:
multiply.png
multiply.png [ 20.76 KiB | Viewed 5373 times ]
Thu Mar 30, 2023 4:38 pm
Profile WWW

Joined: Sat Sep 03, 2022 3:04 am
Posts: 51
If trying random stuff, Booth multiplier?


Thu Mar 30, 2023 7:43 pm
Profile
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
Hi, I just wanted to give a quick update on LALU. I have been thinking about tweaking the layout of the LALU ROM slightly. As you recall the ROM is setup to calculate the result of an operation between the ACC (Register A) and Register B. A typical operation might be to take the logical AND of A and B and put the result back into A. LALU has 16 allowed operations (EX: Add, Subtract, AND, OR, etc.)

One of the operations is the bitwise NOT operation. NOT is special in that it is an unary operator. (It does not depend on the value of the other Register.) Well I got to thinking of how wasteful that is, and it occurred to me that it would be possible to use Register A to control whatever unary operation I want to be performed on B. I attached a flow diagram of the basic idea.

The more I think about this the more useful operators I come up with to perform on an 8 bit operand in B. For example, bit shifts, parity calculations, conversions, set bit calculations, math lookups, rotates, complements, etc. Maybe you have some suggestions for other ops? There is room for 256 different operations.... There is nothing special about this idea, but it was helpful for me to think of it this way. Notice that by putting the operand in B it is preserved after the computation.


Attachments:
lookups.png
lookups.png [ 12.56 KiB | Viewed 5337 times ]
Fri Apr 14, 2023 8:29 pm
Profile WWW

Joined: Sat Sep 03, 2022 3:04 am
Posts: 51
We discussed this an eternity ago. You are only now rediscovering?
Copy/Paste from 2019 dated brainstorm, probably with errors:
I was planning argument A, subFUNction B. Carry MUX'd an alt-result
rather than waste the alt-result as a do-nothing copy of the argument.

NOT seemed redundant to XOR, so you won't find it listed here.
But somehow included negating SGN (bit7), also redundant to XOR???
Steal whatever ideas appeal, and ignore those that make no sense.
There might have been better versions I can no longer easily find.
Its easy to plan and plan and never settle good enough to burn.
----------------------------------------------------------------------------

Input logic external to the table may clear and invert C to any state.
Duplicate functions preserve C state where other behavior not specified.

Bytewide outputs should occupy 2 of 32 tables within a 16Megabit device.
Other tables from that device are documented separately.

8 Carry bit outputs should occupy a 1Megabit device as documented below.
8 FUN/FWC subfunction inputs take the place of the usual B argument.
8 input bits still exist for A. 1 bit Carry input may drive output mux.
-note- my flag results were a separate lookup, to allow the mux option.
Or make Carry-in True a separate table with flagouts as the alt byte...

Carry False / Carry True
00MMMNNN Swap Bit M with Bit N
01MMMNNN Copy Bit M to Bit N
10000NNN Swap Bit N with Carry
10001NNN Copy Bit N to Carry / Invert Bit N to Carry
10010NNN Copy Carry to Bit N
10011NNN Rotate Right N Steps Through Carry
10100NNN Circle N to MSB also to C / Arithmetic Shift N to C, Fill with MSB
10101NNN Count Zeros including C, If Count>N Flag C
10110NNN Count Ones including C, If Count>N Flag C
10111NNN Count Zeros including C, If Count=N Flag C
11000NNN Count Ones including C, If Count=N Flag C
11001NNN Count Leading Zeros / Count Leading Ones, If Count>N Flag C
11010NNN Count Trailing Zeros / Count Trailing Ones, If Count>N Flag C
11011NNN Count Leading Zeros / Count Leading Ones, If Count=N Flag C
11100NNN Count Trailing Zeros / Count Trailing Ones, If Count=N Flag C
111010NN Increment 2 / Decrement 2*N+4 (4,6,8,10), If Result<00 Flag C
111011NN Increment 2 / Increment 2*N+4 (4,6,8,10), If Result>FF Flag C
11110000 Update Carry (as coerced by input logic, purposely avoided elsewhere)
11110001 Parity including C
11110010 Negate SGN (Consider something else. Already covered by XOR)
11110100 SGN to TWC
11110101 TWC to SGN, If Impossible Flag C
11110110 TWC to ABS
11110111 Negate TWC, If Impossible Flag C (Impossible -128 differs from 0-A's Borrow)
11111000 GRY to ABS
11111001 ABS to GRY
11111010 ABS to CDH
11111011 ABS to CDL
11111100 CDL to ABS, If Impossible Flag C
11111101 0+WR30+C
11111111 1+WR30+C

Glossary:
C = Carry = B = Borrow (output state may differ from input)
M = Bit Position or Quantity M
N = Bit Position or Quantity N
MSB = Most Significant Bit #7
LSB = Least Significant Bit #0
ABS = Absolute 000 to 255
TWC = Two's Complement -128 to 127
SGN = Signed -127 to 127
GRY = Graycode 000 to 255
CDH = Coded Decimal High 0xx to 2xx
CDL = Coded Decimal Low x00 to x99
WR30 = Wolfram's Rule #30, with bookends

Above documents only two of eight flag output bits:
FUN/FWC
The other six output flags belong to:
ADD/ADC
SUB/SWB
RSB/RWB


Sat Apr 15, 2023 3:11 pm
Profile
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
Thanks Ken! I was hoping you would share some of your ideas on this topic. I'm wondering how the Wolfram bookends work? (I don't see bookends mentioned in the Wiki.)

I started thinking about this again while attempting to come up with a 16 bit division method assisted with some lookup table shortcuts. The shift and subtract method of division is getting messy when I code it in assembly. Here is a link to a relevant patent on lookup division.

https://www.freepatentsonline.com/7007058.pdf


Sat Apr 15, 2023 5:00 pm
Profile WWW

Joined: Sat Sep 03, 2022 3:04 am
Posts: 51
Wolfram's rule #30's wants to know 10 bits of the previous generation to output 8, thus bookends.
All repeatable from seed pseudorandom generators are automata whether the cells realize or not...
But how to hide the simulation boundaries? Go really wide and truncate, or loop to the other end?

As for that division patent, I've read three times now and still don't get it.
I simply define fixed point bytes HL.XYZ, then lookup only what's needed.

MDL (A/B) *L.*** #modulo or integer remainder
DVL (A/B) *L.***
DVX (A/B) **.X**
DVY (A/B) **.*Y*
DVZ (A/B) **.**Z

Square roots:
SRL (AB) *L.***
SRX (AB) **.X**
SRY (AB) **.*Y*
SRZ (AB) **.**Z

Multiplication:
MPH (A*B) H*.***
MPL (A*B) *L.***

Fast and easy, but occupies several tables. I have space for 32.
Quartersquares QSH QSL could be single argument subFUNctions,
Might live with only DVL,MDL as full size tables...


Sun Apr 16, 2023 4:27 am
Profile
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
Hi, Just a quick update on the LALU Computer project. I have spent most of my time programming the LALU ROM with new lookup tables in Segment 2. This segment is dedicated to performing unary operations on Register B, as selected by the contents of Register A. (This is per the previous discussion above.)

So far I have added 16 lookup tables with various properties, as summarized in the attached table. Four of the lookup tables are for doing quarter-squares multiplication. One nice use of these lookup tables is generating flags (0 or 1) to indicate if the B Byte is a member of a set. For example, does the byte represent any of the ASCII characters for 0,1,2,...D, E, F ? This is a quick way to determine if the user has keyed-in data that is appropriate for representing a hex number.

A very useful tool that I use for testing the LALU ROM is the tester shown in the photo. With the tester I can select the operation and load Registers A and B to get a visual LED indication of the output. In the photo the operation selected in A is 00000000 for bitwise NOT, and you can see the results of bitwise NOT for the data presented in Register B.

I am still thinking about how I want to do trigonometry with unsigned integers. Another thing I am thinking about is creating an operator that does intermediate math in 32 bit precision, before jumping back to 16 bit. There are some operators in Forth that have somewhat similar behaviour. Such an operator would be especially useful for representing numbers (such as PI) by a multiply and divide. (Multiply by 22 and divide by 7, or Multiply by 245 and divide by 78, etc)

Michael


Attachments:
lookup.png
lookup.png [ 15.03 KiB | Viewed 5284 times ]
test_brd.jpg
test_brd.jpg [ 618.91 KiB | Viewed 5284 times ]
Fri Apr 28, 2023 4:38 pm
Profile WWW

Joined: Sun Oct 14, 2018 5:05 pm
Posts: 62
mmruzek wrote:
Another thing I am thinking about is creating an operator that does intermediate math in 32 bit precision, before jumping back to 16 bit. There are some operators in Forth that have somewhat similar behaviour. Such an operator would be especially useful for representing numbers (such as PI) by a multiply and divide. (Multiply by 22 and divide by 7, or Multiply by 245 and divide by 78, etc)

Michael


Not just Forth, but BCPL has an integer MULDIV operator where it multiplies 2 x 32-bit numbers to form a 64-bit intermediate result which is then divided by the 3rd value to return the result. You can't access the intermediate 64-bit value. It's typically used for scaled fixed-point working... So a 16 -> 32 bit implementation here might be handy for some uses when you really don't want (or need) floating point.

Code:
r := muldiv (a,b,c) // (a*b) / c


-Gordon


Fri Apr 28, 2023 5:16 pm
Profile

Joined: Sat Sep 03, 2022 3:04 am
Posts: 51
Cosines of 2*2*2*2*2*2*3*3*3*5*7=60480 angles favor whole number sectioning.
Continue to wave past the end of quadrant till appendix entries are filled at 65535.

CSL _L.___ #No such table, all but one bit blank. OR(A,B)'s Zero flag is equivalent.
CSX __.X__
CSY __._Y_
CSZ __.__Z

Clean factors would also improve 16 or 32bit accuracy. Nothing magic about 24bits.
By "accuracy", I mean scribe common geometries without noticeable gaps...

-------edit-------

Occurs to me that CSX is pegged at FF for quite a while before coming down.
Could represent 32bit accuracy instead, at cost of checking which range applies.
Three appendix could maybe do something too. But not if its gonna slow much.

Maybe less complicated for Cosine tables to offer floating point results instead...


Sat Apr 29, 2023 3:39 pm
Profile
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
Hello! I haven't done much on LALU since my last post, as I am taking a summer break working on a new project: Radio Astronomy. I've always had a interest in it, and this summer I have setup a radio telescope to do hydrogen line observations. Here is a webpage that describes the project... mostly photos. It would be fun to work in a homebrew computer at some point, who knows! Michael

http://www.mtmscientific.com/hydrogen.html


Attachments:
tau_a_2.jpg
tau_a_2.jpg [ 192.17 KiB | Viewed 5158 times ]
Wed Aug 23, 2023 12:18 pm
Profile WWW

Joined: Sat Sep 03, 2022 3:04 am
Posts: 51
2.5GHz measured 17dBi and good match. No scientific plan, just winging it.
Attachment:
PiroSide.jpg
PiroSide.jpg [ 161.41 KiB | Viewed 5124 times ]

Attachment:
PiroEnd.jpg
PiroEnd.jpg [ 189.74 KiB | Viewed 5124 times ]

Then again, also made a halfdozen other canlike horns that didn't work.
No clue exactly what had gone right or wrong in each case.
Unreproducible with test equipment I lacked at the time...


Mon Aug 28, 2023 4:57 am
Profile
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
Hi, I'm back working on the LALU Stack Computer. Here is a new diagram outlining the organization of data flow in LALU. This computer was built to experiment with stacks. LALU has 3 types of stacks: data, return and keyboard. The data stack is used for computation. The return stack is used for storing addresses for jumps. The keyboard stacks are for text-based program instructions written in an interpretive language created for the purpose called "LANG".

Probably the most unique feature of LALU are the keyboard stacks, of which there are 16. Each keyboard stack is selectable as the input stream. The keyboard stacks share a common stack pointer. The pointer can be reset to zero. The first keyboard stack (Stack Zero) is dedicated to interactive keyboard input. The remaining keyboard stacks hold the text-based program input.

The keyboard stacks can be conceptualized as a stacked cache of LANG program instructions, or threads, that each start at stack pointer zero. Jumps between keyboard stacks are a mechanism for program flow. Jumping inside a keyboard stack is also possible, as will be described in a future post about LANG. Michael


Attachments:
lalu_dia_2.png
lalu_dia_2.png [ 14.14 KiB | Viewed 340 times ]
Fri Dec 22, 2023 1:39 pm
Profile WWW
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
Here is a summary of the current Assembly Language instructions for the LALU computer. The instructions are the usual mix of math, logic, input/output, memory and stack operations. All assembly language instructions are 3 characters. All operands are bytes. The special operation LKP is for "LOOKUP", and is used to access useful tabled data. I will be saying more about how these instructions are assembled to create machine code.


Attachments:
asm_1.png
asm_1.png [ 24.8 KiB | Viewed 331 times ]
asm_2.png
asm_2.png [ 24.16 KiB | Viewed 331 times ]
asm_3.png
asm_3.png [ 64.05 KiB | Viewed 331 times ]
Fri Dec 22, 2023 8:08 pm
Profile WWW
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
The mnemonics of the LALU Assembly language are assembled into machine code using a DOS executable program written in C. (POWERC by MIX Software). I call the assembler WWA, which stands for World's Worst Assembler. The most useful feature of the assembler is the ability to place a 3 character label in the listing to represent an address, allowing reference to that address (by name) for jumps. A label is declared using the colon character (:LAB), and referenced using an asterik (*LAB). Comments in the listing are allowed after a semicolon (;Comment).

The assembler creates a binary file suitable for burning into LALU's ROM memory. The assembler has error detection and checks for undefined mnemonics and labels. There is a one-to-one correspondence between a line in the assembly source code, and the bytes assembled to create the binary. This was the simplest approach to keep my sanity. I've attached a representative page of a source listing that shows some of the features mentioned here.


Attachments:
assembler.png
assembler.png [ 31.07 KiB | Viewed 328 times ]
Fri Dec 22, 2023 8:50 pm
Profile WWW
User avatar

Joined: Sun Dec 19, 2021 1:36 pm
Posts: 72
Location: Michigan USA
My last 3 posts described the stack computer architecture of the LALU computer, the assembly language instructions and the assembler. Now I would like to describe an interpretive language being written as an abstraction layer using these tools.

The goal of the language is to be fast, intuitive and capable. I can think of 3 good examples of these criterion: Forth is fast. Basic is intuitive. And C is capable. Another design decision is the native data size(s) of objects, such as byte, word, etc. Also, a common design approach is to use symbol tables and hash lookups. Additionally we want to make good use of the stack architecture.

I decided to start by making everything as simple as possible, and let the complexity creep in later. The first design decision was to define a single data size: 16 bit word (2 bytes). The second design decision was to make the language interpretive. (Interpretive languages are incredibly easy to write.) The third design decision was to make each text word of the language represent a specific action to be done, and make that action depend on the first letter of the word. Here is an example:

'H0001'

The 'H' indicates this is a hexadecimal number (16 bits) to be loaded onto the data stack. When the parser sees the 'H' a jump is made to a routine in ROM that performs the translation of ASCII characters to a number that is pushed to the data stack.

From here, it almost becomes a word game to create language instructions which use unique first letters. This actually turned out to be pretty easy to do. Here are a few more examples.

'+' Means add the 1st and 2nd 16 bit numbers on the data stack and push the answer back onto the stack.
'PRT' Means print the 16 bit number on the top of the data stack to the LCD display. (Triggered by 'P')
"HELLO" Means print the word HELLO on the LCD display. (Triggered by ", stops when " seen again )
'&' Means perform a logical AND between the 1st and 2nd numbers on the data stack and push back the answer.

A disadvantage of this approach is that it requires the programmer to think in terms of a stack, and generally in Reverse Polish Notation.

Adding 2 numbers and printing the result:

H0002 H0002 + PRT

Those are the basics, but the implementation of this approach for more complex logical constructs will be a new post. I welcome thoughts and ideas about this topic. Thanks. Michael


Sun Dec 24, 2023 12:25 pm
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 96 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

Who is online

Users browsing this forum: AhrefsBot and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software