Last visit was: Thu Mar 27, 2025 11:29 pm
|
It is currently Thu Mar 27, 2025 11:29 pm
|
Author |
Message |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Decided to use a general-purpose register – r58, as the loop counter instead of having a dedicated loop count register. And restricted loop counting branches to decrement and jump only. This frees up a whole row of opcodes in the instruction set. Logic for branch instructions is simplified a little bit. It also removes bypass logic dedicated to the loop counter. Using r58 for the loop counter puts the kibosh on the string instruction which were using the loop counter to limit iterations.
_________________Robert Finch http://www.finitron.ca
|
Tue Jan 25, 2022 5:34 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Enabled the PUSH instruction to be compiled and assembled. This reduces the size of code. When there is only a small number of registers to be pushed (<4) it is shorter and faster to use the push instruction.
Realized there were two sets of access occurring to the same registers and merged the code together. Link registers and code address registers were being handled separately, but link registers are just a subset of code address registers so there was no need to do this.
_________________Robert Finch http://www.finitron.ca
|
Thu Jan 27, 2022 10:53 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Changing the way the string instructions work and giving them new names. They worked in an unusual fashion because of the need to update only a single register. By micro-coding the instructions it is possible to update more than one register. Calling them block instructions now. Thinking about having the block set instruction able to source data from a random value in addition to a register source.
Added a special opcode for micro-code branching, needed to branch during the block instructions.
Thinking about reducing the size of the instruction pointer to 56 bits plus eight bits for the micro-instruction pointer. The issue is that the micro-instruction pointer needs to be saved and restored across interrupts in addition to the IP.
_________________Robert Finch http://www.finitron.ca
|
Sun Jan 30, 2022 5:28 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Realizing there are two components to an instruction address, the instruction pointer and the selector, the selector is only 32-bit. That means there is potentially "extra" room available in the selector portion of the address to add a micro-code address. Meaning the IP does not need to be reduced in size. With an eight-bit micro-ip the selector part of the address could expand to 40-bits. The issue is saving and restoring the micro-ip across interrupts. The selector must also be saved so there is some logic to using that part to store the micro-ip.
_________________Robert Finch http://www.finitron.ca
|
Thu Feb 03, 2022 3:17 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Found a bug in the LEAVE instruction. It was not de-allocating the stack frame. It needed to be adding 64 to the stack pointer. How things could work as well as they did, I am not sure. Except that the frame pointer was likely restored correctly. I think fixing this fixed an issue where all zeros were being displayed for output when PutTetra() was called. Things are still not working 100%.
_________________Robert Finch http://www.finitron.ca
|
Fri Feb 04, 2022 6:22 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Changed the register set to 128-bits wide so that quad precision floating-point or decimal floating-point values can be contained in a register.
Changed the compare instruction to be more general purpose. It now performs compares for integers, and decimal-floats at the same time and returns a bit vector of results. The idea is that there is only a single compare instruction for all data types. The CMPU instruction has been removed from the ISA since compare takes care of both signed and unsigned comparisons.
Bits 0 to 15 of the result are reserved for integer results, bits 16 to 31 are reserved for float results, bits 32 to 47 are reserved for decimal float results and bits 48 to 63 are reserved for posit compare results. All the result bits can be tested with a BBS instruction after the compare.
Still stuck on a hardware bug. Trying to displace 987654321 displays 98<punctuation chars>. The punctuation chars are in order. So, it is almost as if there is an extra bit set somehow. Bit 3 is stuck I think.
_________________Robert Finch http://www.finitron.ca
|
Wed Feb 23, 2022 4:06 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Added LDH, STH instructions to load and store hexi-byte values but forgot to update the instruction length table with the new instructions. Resulting in a crash when software was run. Updated the stack operations to use hexi-byte values instead of octa-byte ones. That means operations like push and pop update the stack pointer by 16 instead of 8.
Flashy LEDs no longer works, but the delay is still there. Clearscreen cleared every other character of the display since registers were set to 128 bits. The default int size is 128-bits now which means the stride for the display was off. It was designed to work with 64-bit values.
Got flashy LEDs to work again.
Ran into a nasty compiler bug. Having to do with state from one function to the next maintained when it should not have been.
Switched the default int size to 64-bits. All operations are not yet supported at 128-bits. Had to modify the compiler's internal types. There was only long and short, now there is long, int, and short.
_________________Robert Finch http://www.finitron.ca
|
Thu Feb 24, 2022 5:16 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Splitting shift operations
Just pondering barrel shifters. It looks like for most designs barrel shifting is done in a single cycle. But for Thor I think it may be necessary to split the shift up. Rather than have a shift requiring two or more clock cycles to complete, my thought is to use two instructions, one to shift by higher order bits of the shift count and a second instruction to shift by the low order bits. Many shifts are by small immediate values. These could then be absorbed by an instruction taking only a single clock.
Four to one muxes can be used to shift 0 to 3 bits with a single level of LUT logic. Using two levels of LUT logic a shift from 0 to 15 bits can be done. The shift could be done using enough levels of LUTS (eg 4 levels for 128-bit shifting). But I do not really want that many level of LUT logic.
Got rid of the pair shifting instructions. The same thing can be accomplished in a more general way using the CARRY instruction.
Added a short 32-bit 2R form for the CMP instruction. Previously a 48-bit instruction was all that was available.
_________________Robert Finch http://www.finitron.ca
|
Fri Feb 25, 2022 4:52 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
I am wondering what to do about long running computations like decimal-float divide? Decimal float-divide (128-bit) takes around 2,000 clock cycles as I have got it implemented. This is probably too long a time to have interrupts suspended. Multiply is not quite as bad, taking around 250 clocks. Worked out how to perform a decimal division using Newton-Raphson divide. It is about 2/3 as fast as using a dedicated divide instruction, surprised me as I thought it would be quite a bit slower. I estimate it may take approximately 3,020 clock cycles but the Newton-Raphson divide is interruptible which is highly desirable. I am not sure whether to micro-code this or leave it as a compiler routine. It needs about five temporary registers which should be saved and restored. Micro-coded it can remain as part of the instruction set. Smells like something that should be tossed however. The Newton-Raphson divide approach needs a couple of helper instructions to adjust the exponents and compute the bit shifts. Adjusting the exponents takes about 8 to 10 instructions to do in a general fashion. The divisor needs to be normalized between 0.5 and 1.0. This involves dividing by 10 and multiplying by a factor (1, 2, 4, or 8) that results in an in-range number. The scaling factor needs to be recorded because it will be needed later to reverse the effect of scaling. A reciprocal estimate function is also needed. So, a look-up table with 1000 entries for three-digit decimal accuracy. Starting with three digits of accuracy about four iterations are necessary to get to 34-digit accuracy. The divide routine looks something like: Code: .code _DFPDivide128: ENTER 80 STH s0,[SP] STH s1,16[SP] STH s2,32[SP] STH s3,48[SP] STH s4,64[SP] LDH a1,64[FP] # get divisor LDH a0,80[FP] # get dividend LDH s2,DFTWO # s2 = constant 2.0 1 clock DFDIVIDEND_ADJ a0,a0,a1 # a0 = dividend, a1 = divisor 1 clock DFDIVISOR_BITADJ t0,a1 # get bit shift 1 clock DFDIVISOR_ADJ s0,a1 # scale divisor to 0.5 to 1.0 14 clocks DFRES s0,s0 # r5 = X(0) 4 clocks # Five iterations of Newton-Raphson unrolled DFMUL s1,a1,s0 # 250 clocks DFSUB s3,s2,s1 # 50 clocks DFMUL s0,s3,s0 # 250 clocks DFMUL s1,a1,s0 DFSUB s3,s2,s1 DFMUL s0,s3,s0 DFMUL s1,a1,s0 DFSUB s3,s2,s1 DFMUL s0,s3,s0 DFMUL s1,a1,s0 DFSUB s3,s2,s1 DFMUL s0,s3,s0 DFMUL s1,a1,s0 DFSUB s3,s2,s1 DFMUL s0,s3,s0 # Shift zero to three bits to the left, the divisor may have been this many # bits too big. BEQZ t0,lab1 SLL t0,t0,#4 LDH t0,DFONE[t0] # load 1, 0.5, 0.25 or 0.125 DFMUL s0,s0,t0 # 250 clocks lab1: DFMUL a0,a0,s0 # 250 clocks LDH s0,[SP] LDH s1,16[SP] LDH s2,32[SP] LDH s3,48[SP] LDH s4,64[SP] LEAVE 32
.rodata DFONE: .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x25 DFPOINT5: .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80,0xff,0x35 DFPOINT25 .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x40,0xa5,0x21 DFPOINT125: .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x78,0x21 DFTWO: .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x29 DFFOUR: .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x31 DFEIGHT: .byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x69
_________________Robert Finch http://www.finitron.ca
|
Sat Feb 26, 2022 4:34 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1814
|
Feels to me like it's best to leave this to a user-code library. That way things like space-time tradeoffs and register usage can be chosen for the application. And it doesn't affect interrupts.
I've a feeling there was a situation once where floating point could not be used in interrupt context. And that sounds like a fine tradeoff to me.
|
Sat Feb 26, 2022 8:03 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Quote: Feels to me like it's best to leave this to a user-code library. That way things like space-time tradeoffs and register usage can be chosen for the application. And it doesn't affect interrupts.
I've a feeling there was a situation once where floating point could not be used in interrupt context. And that sounds like a fine tradeoff to me. Yeah, I have decided to leave it as a user code library routine. That along with square root. The performance of doing it in software is close enough to doing it in hardware that I think it is better as a software routine. As you say, it allows more tradeoffs. Got the decimal-float reciprocal estimate module written. It returns an estimate of the reciprocal for numbers between 0.1 and 1.0. The estimate comes from a look-up table. It is accurate only for about three decimal digits as there is no interpolation taking place. Spent too much time trying to get float <-> decimal float routines working. I may leave it up to software too. So the decimal float hardware is pretty basic. Add, sub, multiply, reciprocal estimate, int <-> decimal float conversion and compare.
_________________Robert Finch http://www.finitron.ca
|
Mon Feb 28, 2022 4:16 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Thinking of keeping the instructions in the ISA but having them trap to software routines instead of providing hardware. That way it looks like hardware is present. It might make writing software easier.
_________________Robert Finch http://www.finitron.ca
|
Mon Feb 28, 2022 4:24 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Working on Thor2022 now. Decided to reduce the number of registers to 32 from 64. This freed up some opcode bits and allows more instructions to be encoded into 32-bits. It means also that there is less state to worry about.
Made a pretty map of the instruction set for Thor 2022. Most of the opcodes have been retained from 2021. Branch instructions can specify a compare function now. There are currently five compare functions, signed integer, unsigned integer, quad float, quad decimal float and posit.
Did a lot of work getting float modules for different sizes of float operations. Previously, there was just a single module for all sizes. For example, the fpAddsub module handled all sizes of operations. This has been broken out now to fpAddsub32, fpAddsub64, fpAddsub128. The issue was that previously different sizes of operation could not be mixed in the same design because of the way the float package was setup. This meant one could not have doubles and singles in the same design for instance. There are now separate packages for each size.
_________________Robert Finch http://www.finitron.ca
|
Tue Mar 01, 2022 6:52 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Got rid of the segmentation. With good MMU support segmentation is not required.
Got rid of index scaling for indexed address modes. The compiler was not able to make effective use of it much of the time. Getting rid of the scaling made index mode instructions small enough to fit into 32 bits.
Did some work on the MMU. Planning to use an inverted page table, so wrote an article about its usage.
_________________Robert Finch http://www.finitron.ca
|
Wed Mar 02, 2022 5:48 am |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2263 Location: Canada
|
Updated the bus interface unit (BIU) to use an inverted page table with automatic page table walking. Previously a TLB miss would trap to software. A TLB miss now causes an automatic page table search for the address translation. If the translation is found the TLB is automatically updated, otherwise a page fault occurs. The TLB component from Thor2021 was updated and re-used. The table’s associativity is now a parameter.
It turned out to be easier than I thought it might be to incorporate the inverted page table walking code.
_________________Robert Finch http://www.finitron.ca
|
Thu Mar 03, 2022 4:56 am |
|
Who is online |
Users browsing this forum: AhrefsBot, CCBot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|