Last visit was: Wed Oct 09, 2024 7:05 pm
|
It is currently Wed Oct 09, 2024 7:05 pm
|
LALU Computer: Lookup Arithmetic Logic Unit
Author |
Message |
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
For the LALU Computer I have recently been benchmarking a few methods to do 16 bit multiplication in my interpretive language LANG. I tested doing multiplication 3 different ways: 1) A simple addition loop written in LANG, 2) Defining multiplication as consecutive additions in assembly language (pointed to by the operator symbol "*"), and 3) Defining multiplication as bit shifts and additions, also pointed to as the operator symbol "*".
Here are the tests and the results: 16 bit multiplication of the numbers Hx0137 and Hx00D2 performed 1000X
Multiplication as a looped series of simple additions in LANG #1 H03E8 VSA 0 VSB :A is loop index, B is running sum #2 VRA 1 - VSA VRA IF GT3 ELSE QUIT #3 H00D2 DO VRB H0137 + VSB LP :The actual multiplication #4 0 VSB GT2 :Loop to do multiplication 1000 times RESULT: 310 Seconds
Multiplication symbol "*" defined in assembly as looped consecutive adding #1 H03E8 DO H0137 H00D2 * LP #2 QUIT RESULT: 23 Seconds
Multiplication symbol '*" defined in assembly as shifts and adds (See attached explanation) #1 H03E8 DO H0137 H00D2 * LP #2 QUIT RESULT: 3 Seconds
We see that the shift and adding method is about 100X faster that simpled looped addition. Attached is a description of the shift and add method from a website about bit-based algos.
A couple ideas to make this faster would be to use lookup tables, or build some hardware to do it. Both interesting options...
You do not have the required permissions to view the files attached to this post.
|
Thu Mar 30, 2023 4:38 pm |
|
|
Ken KD5ZXG
Joined: Sat Sep 03, 2022 3:04 am Posts: 51
|
If trying random stuff, Booth multiplier?
|
Thu Mar 30, 2023 7:43 pm |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
Hi, I just wanted to give a quick update on LALU. I have been thinking about tweaking the layout of the LALU ROM slightly. As you recall the ROM is setup to calculate the result of an operation between the ACC (Register A) and Register B. A typical operation might be to take the logical AND of A and B and put the result back into A. LALU has 16 allowed operations (EX: Add, Subtract, AND, OR, etc.)
One of the operations is the bitwise NOT operation. NOT is special in that it is an unary operator. (It does not depend on the value of the other Register.) Well I got to thinking of how wasteful that is, and it occurred to me that it would be possible to use Register A to control whatever unary operation I want to be performed on B. I attached a flow diagram of the basic idea.
The more I think about this the more useful operators I come up with to perform on an 8 bit operand in B. For example, bit shifts, parity calculations, conversions, set bit calculations, math lookups, rotates, complements, etc. Maybe you have some suggestions for other ops? There is room for 256 different operations.... There is nothing special about this idea, but it was helpful for me to think of it this way. Notice that by putting the operand in B it is preserved after the computation.
You do not have the required permissions to view the files attached to this post.
|
Fri Apr 14, 2023 8:29 pm |
|
|
Ken KD5ZXG
Joined: Sat Sep 03, 2022 3:04 am Posts: 51
|
We discussed this an eternity ago. You are only now rediscovering? Copy/Paste from 2019 dated brainstorm, probably with errors: I was planning argument A, subFUNction B. Carry MUX'd an alt-result rather than waste the alt-result as a do-nothing copy of the argument.
NOT seemed redundant to XOR, so you won't find it listed here. But somehow included negating SGN (bit7), also redundant to XOR??? Steal whatever ideas appeal, and ignore those that make no sense. There might have been better versions I can no longer easily find. Its easy to plan and plan and never settle good enough to burn. ----------------------------------------------------------------------------
Input logic external to the table may clear and invert C to any state. Duplicate functions preserve C state where other behavior not specified.
Bytewide outputs should occupy 2 of 32 tables within a 16Megabit device. Other tables from that device are documented separately. 8 Carry bit outputs should occupy a 1Megabit device as documented below. 8 FUN/FWC subfunction inputs take the place of the usual B argument. 8 input bits still exist for A. 1 bit Carry input may drive output mux. -note- my flag results were a separate lookup, to allow the mux option. Or make Carry-in True a separate table with flagouts as the alt byte...
Carry False / Carry True 00MMMNNN Swap Bit M with Bit N 01MMMNNN Copy Bit M to Bit N 10000NNN Swap Bit N with Carry 10001NNN Copy Bit N to Carry / Invert Bit N to Carry 10010NNN Copy Carry to Bit N 10011NNN Rotate Right N Steps Through Carry 10100NNN Circle N to MSB also to C / Arithmetic Shift N to C, Fill with MSB 10101NNN Count Zeros including C, If Count>N Flag C 10110NNN Count Ones including C, If Count>N Flag C 10111NNN Count Zeros including C, If Count=N Flag C 11000NNN Count Ones including C, If Count=N Flag C 11001NNN Count Leading Zeros / Count Leading Ones, If Count>N Flag C 11010NNN Count Trailing Zeros / Count Trailing Ones, If Count>N Flag C 11011NNN Count Leading Zeros / Count Leading Ones, If Count=N Flag C 11100NNN Count Trailing Zeros / Count Trailing Ones, If Count=N Flag C 111010NN Increment 2 / Decrement 2*N+4 (4,6,8,10), If Result<00 Flag C 111011NN Increment 2 / Increment 2*N+4 (4,6,8,10), If Result>FF Flag C 11110000 Update Carry (as coerced by input logic, purposely avoided elsewhere) 11110001 Parity including C 11110010 Negate SGN (Consider something else. Already covered by XOR) 11110100 SGN to TWC 11110101 TWC to SGN, If Impossible Flag C 11110110 TWC to ABS 11110111 Negate TWC, If Impossible Flag C (Impossible -128 differs from 0-A's Borrow) 11111000 GRY to ABS 11111001 ABS to GRY 11111010 ABS to CDH 11111011 ABS to CDL 11111100 CDL to ABS, If Impossible Flag C 11111101 0+WR30+C 11111111 1+WR30+C
Glossary: C = Carry = B = Borrow (output state may differ from input) M = Bit Position or Quantity M N = Bit Position or Quantity N MSB = Most Significant Bit #7 LSB = Least Significant Bit #0 ABS = Absolute 000 to 255 TWC = Two's Complement -128 to 127 SGN = Signed -127 to 127 GRY = Graycode 000 to 255 CDH = Coded Decimal High 0xx to 2xx CDL = Coded Decimal Low x00 to x99 WR30 = Wolfram's Rule #30, with bookends
Above documents only two of eight flag output bits: FUN/FWC The other six output flags belong to: ADD/ADC SUB/SWB RSB/RWB
|
Sat Apr 15, 2023 3:11 pm |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
Thanks Ken! I was hoping you would share some of your ideas on this topic. I'm wondering how the Wolfram bookends work? (I don't see bookends mentioned in the Wiki.) I started thinking about this again while attempting to come up with a 16 bit division method assisted with some lookup table shortcuts. The shift and subtract method of division is getting messy when I code it in assembly. Here is a link to a relevant patent on lookup division. https://www.freepatentsonline.com/7007058.pdf
|
Sat Apr 15, 2023 5:00 pm |
|
|
Ken KD5ZXG
Joined: Sat Sep 03, 2022 3:04 am Posts: 51
|
Wolfram's rule #30's wants to know 10 bits of the previous generation to output 8, thus bookends. All repeatable from seed pseudorandom generators are automata whether the cells realize or not... But how to hide the simulation boundaries? Go really wide and truncate, or loop to the other end?
As for that division patent, I've read three times now and still don't get it. I simply define fixed point bytes HL.XYZ, then lookup only what's needed.
MDL (A/B) *L.*** #modulo or integer remainder DVL (A/B) *L.*** DVX (A/B) **.X** DVY (A/B) **.*Y* DVZ (A/B) **.**Z
Square roots: SRL (AB) *L.*** SRX (AB) **.X** SRY (AB) **.*Y* SRZ (AB) **.**Z
Multiplication: MPH (A*B) H*.*** MPL (A*B) *L.***
Fast and easy, but occupies several tables. I have space for 32. Quartersquares QSH QSL could be single argument subFUNctions, Might live with only DVL,MDL as full size tables...
|
Sun Apr 16, 2023 4:27 am |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
Hi, Just a quick update on the LALU Computer project. I have spent most of my time programming the LALU ROM with new lookup tables in Segment 2. This segment is dedicated to performing unary operations on Register B, as selected by the contents of Register A. (This is per the previous discussion above.)
So far I have added 16 lookup tables with various properties, as summarized in the attached table. Four of the lookup tables are for doing quarter-squares multiplication. One nice use of these lookup tables is generating flags (0 or 1) to indicate if the B Byte is a member of a set. For example, does the byte represent any of the ASCII characters for 0,1,2,...D, E, F ? This is a quick way to determine if the user has keyed-in data that is appropriate for representing a hex number.
A very useful tool that I use for testing the LALU ROM is the tester shown in the photo. With the tester I can select the operation and load Registers A and B to get a visual LED indication of the output. In the photo the operation selected in A is 00000000 for bitwise NOT, and you can see the results of bitwise NOT for the data presented in Register B.
I am still thinking about how I want to do trigonometry with unsigned integers. Another thing I am thinking about is creating an operator that does intermediate math in 32 bit precision, before jumping back to 16 bit. There are some operators in Forth that have somewhat similar behaviour. Such an operator would be especially useful for representing numbers (such as PI) by a multiply and divide. (Multiply by 22 and divide by 7, or Multiply by 245 and divide by 78, etc)
Michael
You do not have the required permissions to view the files attached to this post.
|
Fri Apr 28, 2023 4:38 pm |
|
|
drogon
Joined: Sun Oct 14, 2018 5:05 pm Posts: 62
|
mmruzek wrote: Another thing I am thinking about is creating an operator that does intermediate math in 32 bit precision, before jumping back to 16 bit. There are some operators in Forth that have somewhat similar behaviour. Such an operator would be especially useful for representing numbers (such as PI) by a multiply and divide. (Multiply by 22 and divide by 7, or Multiply by 245 and divide by 78, etc)
Michael Not just Forth, but BCPL has an integer MULDIV operator where it multiplies 2 x 32-bit numbers to form a 64-bit intermediate result which is then divided by the 3rd value to return the result. You can't access the intermediate 64-bit value. It's typically used for scaled fixed-point working... So a 16 -> 32 bit implementation here might be handy for some uses when you really don't want (or need) floating point. Code: r := muldiv (a,b,c) // (a*b) / c -Gordon
|
Fri Apr 28, 2023 5:16 pm |
|
|
Ken KD5ZXG
Joined: Sat Sep 03, 2022 3:04 am Posts: 51
|
Cosines of 2*2*2*2*2*2*3*3*3*5*7=60480 angles favor whole number sectioning. Continue to wave past the end of quadrant till appendix entries are filled at 65535.
CSL _L.___ #No such table, all but one bit blank. OR(A,B)'s Zero flag is equivalent. CSX __.X__ CSY __._Y_ CSZ __.__Z
Clean factors would also improve 16 or 32bit accuracy. Nothing magic about 24bits. By "accuracy", I mean scribe common geometries without noticeable gaps...
-------edit-------
Occurs to me that CSX is pegged at FF for quite a while before coming down. Could represent 32bit accuracy instead, at cost of checking which range applies. Three appendix could maybe do something too. But not if its gonna slow much.
Maybe less complicated for Cosine tables to offer floating point results instead...
|
Sat Apr 29, 2023 3:39 pm |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
Hello! I haven't done much on LALU since my last post, as I am taking a summer break working on a new project: Radio Astronomy. I've always had a interest in it, and this summer I have setup a radio telescope to do hydrogen line observations. Here is a webpage that describes the project... mostly photos. It would be fun to work in a homebrew computer at some point, who knows! Michael http://www.mtmscientific.com/hydrogen.html
You do not have the required permissions to view the files attached to this post.
|
Wed Aug 23, 2023 12:18 pm |
|
|
Ken KD5ZXG
Joined: Sat Sep 03, 2022 3:04 am Posts: 51
|
2.5GHz measured 17dBi and good match. No scientific plan, just winging it. Then again, also made a halfdozen other canlike horns that didn't work. No clue exactly what had gone right or wrong in each case. Unreproducible with test equipment I lacked at the time...
You do not have the required permissions to view the files attached to this post.
|
Mon Aug 28, 2023 4:57 am |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
Hi, I'm back working on the LALU Stack Computer. Here is a new diagram outlining the organization of data flow in LALU. This computer was built to experiment with stacks. LALU has 3 types of stacks: data, return and keyboard. The data stack is used for computation. The return stack is used for storing addresses for jumps. The keyboard stacks are for text-based program instructions written in an interpretive language created for the purpose called "LANG".
Probably the most unique feature of LALU are the keyboard stacks, of which there are 16. Each keyboard stack is selectable as the input stream. The keyboard stacks share a common stack pointer. The pointer can be reset to zero. The first keyboard stack (Stack Zero) is dedicated to interactive keyboard input. The remaining keyboard stacks hold the text-based program input.
The keyboard stacks can be conceptualized as a stacked cache of LANG program instructions, or threads, that each start at stack pointer zero. Jumps between keyboard stacks are a mechanism for program flow. Jumping inside a keyboard stack is also possible, as will be described in a future post about LANG. Michael
You do not have the required permissions to view the files attached to this post.
|
Fri Dec 22, 2023 1:39 pm |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
Here is a summary of the current Assembly Language instructions for the LALU computer. The instructions are the usual mix of math, logic, input/output, memory and stack operations. All assembly language instructions are 3 characters. All operands are bytes. The special operation LKP is for "LOOKUP", and is used to access useful tabled data. I will be saying more about how these instructions are assembled to create machine code.
You do not have the required permissions to view the files attached to this post.
|
Fri Dec 22, 2023 8:08 pm |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
The mnemonics of the LALU Assembly language are assembled into machine code using a DOS executable program written in C. (POWERC by MIX Software). I call the assembler WWA, which stands for World's Worst Assembler. The most useful feature of the assembler is the ability to place a 3 character label in the listing to represent an address, allowing reference to that address (by name) for jumps. A label is declared using the colon character (:LAB), and referenced using an asterik (*LAB). Comments in the listing are allowed after a semicolon (;Comment).
The assembler creates a binary file suitable for burning into LALU's ROM memory. The assembler has error detection and checks for undefined mnemonics and labels. There is a one-to-one correspondence between a line in the assembly source code, and the bytes assembled to create the binary. This was the simplest approach to keep my sanity. I've attached a representative page of a source listing that shows some of the features mentioned here.
You do not have the required permissions to view the files attached to this post.
|
Fri Dec 22, 2023 8:50 pm |
|
|
mmruzek
Joined: Sun Dec 19, 2021 1:36 pm Posts: 79 Location: Michigan USA
|
My last 3 posts described the stack computer architecture of the LALU computer, the assembly language instructions and the assembler. Now I would like to describe an interpretive language being written as an abstraction layer using these tools.
The goal of the language is to be fast, intuitive and capable. I can think of 3 good examples of these criterion: Forth is fast. Basic is intuitive. And C is capable. Another design decision is the native data size(s) of objects, such as byte, word, etc. Also, a common design approach is to use symbol tables and hash lookups. Additionally we want to make good use of the stack architecture.
I decided to start by making everything as simple as possible, and let the complexity creep in later. The first design decision was to define a single data size: 16 bit word (2 bytes). The second design decision was to make the language interpretive. (Interpretive languages are incredibly easy to write.) The third design decision was to make each text word of the language represent a specific action to be done, and make that action depend on the first letter of the word. Here is an example:
'H0001'
The 'H' indicates this is a hexadecimal number (16 bits) to be loaded onto the data stack. When the parser sees the 'H' a jump is made to a routine in ROM that performs the translation of ASCII characters to a number that is pushed to the data stack.
From here, it almost becomes a word game to create language instructions which use unique first letters. This actually turned out to be pretty easy to do. Here are a few more examples.
'+' Means add the 1st and 2nd 16 bit numbers on the data stack and push the answer back onto the stack. 'PRT' Means print the 16 bit number on the top of the data stack to the LCD display. (Triggered by 'P') "HELLO" Means print the word HELLO on the LCD display. (Triggered by ", stops when " seen again ) '&' Means perform a logical AND between the 1st and 2nd numbers on the data stack and push back the answer.
A disadvantage of this approach is that it requires the programmer to think in terms of a stack, and generally in Reverse Polish Notation.
Adding 2 numbers and printing the result:
H0002 H0002 + PRT
Those are the basics, but the implementation of this approach for more complex logical constructs will be a new post. I welcome thoughts and ideas about this topic. Thanks. Michael
|
Sun Dec 24, 2023 12:25 pm |
|
Who is online |
Users browsing this forum: CCBot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|