Last visit was: Sat Jul 13, 2024 10:34 am
It is currently Sat Jul 13, 2024 10:34 am



 [ 775 posts ]  Go to page Previous  1 ... 36, 37, 38, 39, 40, 41, 42 ... 52  Next
 Thor Core / FT64 
Author Message

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Decided to use a general-purpose register – r58, as the loop counter instead of having a dedicated loop count register. And restricted loop counting branches to decrement and jump only. This frees up a whole row of opcodes in the instruction set. Logic for branch instructions is simplified a little bit. It also removes bypass logic dedicated to the loop counter.
Using r58 for the loop counter puts the kibosh on the string instruction which were using the loop counter to limit iterations.

_________________
Robert Finch http://www.finitron.ca


Tue Jan 25, 2022 5:34 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Enabled the PUSH instruction to be compiled and assembled. This reduces the size of code. When there is only a small number of registers to be pushed (<4) it is shorter and faster to use the push instruction.

Realized there were two sets of access occurring to the same registers and merged the code together. Link registers and code address registers were being handled separately, but link registers are just a subset of code address registers so there was no need to do this.

_________________
Robert Finch http://www.finitron.ca


Thu Jan 27, 2022 10:53 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Changing the way the string instructions work and giving them new names. They worked in an unusual fashion because of the need to update only a single register. By micro-coding the instructions it is possible to update more than one register. Calling them block instructions now.
Thinking about having the block set instruction able to source data from a random value in addition to a register source.

Added a special opcode for micro-code branching, needed to branch during the block instructions.

Thinking about reducing the size of the instruction pointer to 56 bits plus eight bits for the micro-instruction pointer. The issue is that the micro-instruction pointer needs to be saved and restored across interrupts in addition to the IP.

_________________
Robert Finch http://www.finitron.ca


Sun Jan 30, 2022 5:28 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Realizing there are two components to an instruction address, the instruction pointer and the selector, the selector is only 32-bit. That means there is potentially "extra" room available in the selector portion of the address to add a micro-code address. Meaning the IP does not need to be reduced in size. With an eight-bit micro-ip the selector part of the address could expand to 40-bits. The issue is saving and restoring the micro-ip across interrupts. The selector must also be saved so there is some logic to using that part to store the micro-ip.

_________________
Robert Finch http://www.finitron.ca


Thu Feb 03, 2022 3:17 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Found a bug in the LEAVE instruction. It was not de-allocating the stack frame. It needed to be adding 64 to the stack pointer. How things could work as well as they did, I am not sure. Except that the frame pointer was likely restored correctly. I think fixing this fixed an issue where all zeros were being displayed for output when PutTetra() was called. Things are still not working 100%.

_________________
Robert Finch http://www.finitron.ca


Fri Feb 04, 2022 6:22 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Changed the register set to 128-bits wide so that quad precision floating-point or decimal floating-point values can be contained in a register.

Changed the compare instruction to be more general purpose. It now performs compares for integers, and decimal-floats at the same time and returns a bit vector of results. The idea is that there is only a single compare instruction for all data types. The CMPU instruction has been removed from the ISA since compare takes care of both signed and unsigned comparisons.

Bits 0 to 15 of the result are reserved for integer results, bits 16 to 31 are reserved for float results, bits 32 to 47 are reserved for decimal float results and bits 48 to 63 are reserved for posit compare results. All the result bits can be tested with a BBS instruction after the compare.

Still stuck on a hardware bug. Trying to displace 987654321 displays 98<punctuation chars>. The punctuation chars are in order. So, it is almost as if there is an extra bit set somehow. Bit 3 is stuck I think.

_________________
Robert Finch http://www.finitron.ca


Wed Feb 23, 2022 4:06 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Added LDH, STH instructions to load and store hexi-byte values but forgot to update the instruction length table with the new instructions. Resulting in a crash when software was run. Updated the stack operations to use hexi-byte values instead of octa-byte ones. That means operations like push and pop update the stack pointer by 16 instead of 8.

Flashy LEDs no longer works, but the delay is still there. Clearscreen cleared every other character of the display since registers were set to 128 bits. The default int size is 128-bits now which means the stride for the display was off. It was designed to work with 64-bit values.

Got flashy LEDs to work again.

Ran into a nasty compiler bug. Having to do with state from one function to the next maintained when it should not have been.

Switched the default int size to 64-bits. All operations are not yet supported at 128-bits. Had to modify the compiler's internal types. There was only long and short, now there is long, int, and short.

_________________
Robert Finch http://www.finitron.ca


Thu Feb 24, 2022 5:16 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Splitting shift operations

Just pondering barrel shifters. It looks like for most designs barrel shifting is done in a single cycle. But for Thor I think it may be necessary to split the shift up. Rather than have a shift requiring two or more clock cycles to complete, my thought is to use two instructions, one to shift by higher order bits of the shift count and a second instruction to shift by the low order bits. Many shifts are by small immediate values. These could then be absorbed by an instruction taking only a single clock.

Four to one muxes can be used to shift 0 to 3 bits with a single level of LUT logic. Using two levels of LUT logic a shift from 0 to 15 bits can be done. The shift could be done using enough levels of LUTS (eg 4 levels for 128-bit shifting). But I do not really want that many level of LUT logic.

Got rid of the pair shifting instructions. The same thing can be accomplished in a more general way using the CARRY instruction.

Added a short 32-bit 2R form for the CMP instruction. Previously a 48-bit instruction was all that was available.

_________________
Robert Finch http://www.finitron.ca


Fri Feb 25, 2022 4:52 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
I am wondering what to do about long running computations like decimal-float divide? Decimal float-divide (128-bit) takes around 2,000 clock cycles as I have got it implemented. This is probably too long a time to have interrupts suspended. Multiply is not quite as bad, taking around 250 clocks.

Worked out how to perform a decimal division using Newton-Raphson divide. It is about 2/3 as fast as using a dedicated divide instruction, surprised me as I thought it would be quite a bit slower. I estimate it may take approximately 3,020 clock cycles but the Newton-Raphson divide is interruptible which is highly desirable. I am not sure whether to micro-code this or leave it as a compiler routine. It needs about five temporary registers which should be saved and restored. Micro-coded it can remain as part of the instruction set. Smells like something that should be tossed however.

The Newton-Raphson divide approach needs a couple of helper instructions to adjust the exponents and compute the bit shifts. Adjusting the exponents takes about 8 to 10 instructions to do in a general fashion. The divisor needs to be normalized between 0.5 and 1.0. This involves dividing by 10 and multiplying by a factor (1, 2, 4, or 8) that results in an in-range number. The scaling factor needs to be recorded because it will be needed later to reverse the effect of scaling. A reciprocal estimate function is also needed. So, a look-up table with 1000 entries for three-digit decimal accuracy. Starting with three digits of accuracy about four iterations are necessary to get to 34-digit accuracy.

The divide routine looks something like:
Code:
   .code
_DFPDivide128:
   ENTER 80
   STH      s0,[SP]
   STH      s1,16[SP]
   STH      s2,32[SP]
   STH      s3,48[SP]
   STH      s4,64[SP]
   LDH      a1,64[FP]                     # get divisor
   LDH      a0,80[FP]                     # get dividend
   LDH      s2,DFTWO                     # s2 = constant 2.0                  1 clock
   DFDIVIDEND_ADJ      a0,a0,a1   # a0 = dividend, a1 = divisor 1 clock
   DFDIVISOR_BITADJ   t0,a1         # get bit shift                        1 clock
   DFDIVISOR_ADJ         s0,a1         # scale divisor to 0.5 to 1.0   14 clocks
   DFRES   s0,s0                           # r5 = X(0)                              4 clocks
   # Five iterations of Newton-Raphson unrolled
   DFMUL   s1,a1,s0                     #                                             250 clocks
   DFSUB   s3,s2,s1                     #                                              50 clocks
   DFMUL   s0,s3,s0                     #                                            250 clocks
   DFMUL   s1,a1,s0
   DFSUB   s3,s2,s1
   DFMUL   s0,s3,s0
   DFMUL   s1,a1,s0
   DFSUB   s3,s2,s1
   DFMUL   s0,s3,s0
   DFMUL   s1,a1,s0
   DFSUB   s3,s2,s1
   DFMUL   s0,s3,s0
   DFMUL   s1,a1,s0
   DFSUB   s3,s2,s1
   DFMUL   s0,s3,s0
   # Shift zero to three bits to the left, the divisor may have been this many
   # bits too big.
   BEQZ  t0,lab1
   SLL           t0,t0,#4
   LDH      t0,DFONE[t0]               # load 1, 0.5, 0.25 or 0.125
   DFMUL   s0,s0,t0                     #                                          250 clocks
lab1:
   DFMUL   a0,a0,s0                     #                                          250 clocks   
   LDH      s0,[SP]
   LDH      s1,16[SP]
   LDH      s2,32[SP]
   LDH      s3,48[SP]
   LDH      s4,64[SP]
   LEAVE   32

   .rodata
DFONE:
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x25
DFPOINT5:
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80,0xff,0x35
DFPOINT25
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x40,0xa5,0x21
DFPOINT125:
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x78,0x21
DFTWO:
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x29
DFFOUR:
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x31
DFEIGHT:
   .byte   0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xc0,0xff,0x69


_________________
Robert Finch http://www.finitron.ca


Sat Feb 26, 2022 4:34 am WWW

Joined: Wed Jan 09, 2013 6:54 pm
Posts: 1789
Feels to me like it's best to leave this to a user-code library. That way things like space-time tradeoffs and register usage can be chosen for the application. And it doesn't affect interrupts.

I've a feeling there was a situation once where floating point could not be used in interrupt context. And that sounds like a fine tradeoff to me.


Sat Feb 26, 2022 8:03 am

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Quote:
Feels to me like it's best to leave this to a user-code library. That way things like space-time tradeoffs and register usage can be chosen for the application. And it doesn't affect interrupts.

I've a feeling there was a situation once where floating point could not be used in interrupt context. And that sounds like a fine tradeoff to me.

Yeah, I have decided to leave it as a user code library routine. That along with square root. The performance of doing it in software is close enough to doing it in hardware that I think it is better as a software routine. As you say, it allows more tradeoffs.

Got the decimal-float reciprocal estimate module written. It returns an estimate of the reciprocal for numbers between 0.1 and 1.0. The estimate comes from a look-up table. It is accurate only for about three decimal digits as there is no interpolation taking place.

Spent too much time trying to get float <-> decimal float routines working. I may leave it up to software too.

So the decimal float hardware is pretty basic. Add, sub, multiply, reciprocal estimate, int <-> decimal float conversion and compare.

_________________
Robert Finch http://www.finitron.ca


Mon Feb 28, 2022 4:16 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Thinking of keeping the instructions in the ISA but having them trap to software routines instead of providing hardware. That way it looks like hardware is present. It might make writing software easier.

_________________
Robert Finch http://www.finitron.ca


Mon Feb 28, 2022 4:24 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Working on Thor2022 now. Decided to reduce the number of registers to 32 from 64. This freed up some opcode bits and allows more instructions to be encoded into 32-bits. It means also that there is less state to worry about.

Made a pretty map of the instruction set for Thor 2022. Most of the opcodes have been retained from 2021. Branch instructions can specify a compare function now. There are currently five compare functions, signed integer, unsigned integer, quad float, quad decimal float and posit.

Did a lot of work getting float modules for different sizes of float operations. Previously, there was just a single module for all sizes. For example, the fpAddsub module handled all sizes of operations. This has been broken out now to fpAddsub32, fpAddsub64, fpAddsub128. The issue was that previously different sizes of operation could not be mixed in the same design because of the way the float package was setup. This meant one could not have doubles and singles in the same design for instance. There are now separate packages for each size.

_________________
Robert Finch http://www.finitron.ca


Tue Mar 01, 2022 6:52 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Got rid of the segmentation. With good MMU support segmentation is not required.

Got rid of index scaling for indexed address modes. The compiler was not able to make effective use of it much of the time. Getting rid of the scaling made index mode instructions small enough to fit into 32 bits.

Did some work on the MMU. Planning to use an inverted page table, so wrote an article about its usage.

_________________
Robert Finch http://www.finitron.ca


Wed Mar 02, 2022 5:48 am WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2101
Location: Canada
Updated the bus interface unit (BIU) to use an inverted page table with automatic page table walking. Previously a TLB miss would trap to software. A TLB miss now causes an automatic page table search for the address translation. If the translation is found the TLB is automatically updated, otherwise a page fault occurs. The TLB component from Thor2021 was updated and re-used. The table’s associativity is now a parameter.

It turned out to be easier than I thought it might be to incorporate the inverted page table walking code.

_________________
Robert Finch http://www.finitron.ca


Thu Mar 03, 2022 4:56 am WWW
 [ 775 posts ]  Go to page Previous  1 ... 36, 37, 38, 39, 40, 41, 42 ... 52  Next

Who is online

Users browsing this forum: CCBot and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software