View unanswered posts | View active topics It is currently Thu Apr 25, 2024 4:44 am



Reply to topic  [ 775 posts ]  Go to page Previous  1 ... 35, 36, 37, 38, 39, 40, 41 ... 52  Next
 Thor Core / FT64 
Author Message

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
Are you generating the correct binary code? I am thinking a short constant instruction
might map or decode wrong.


Tue Dec 14, 2021 9:17 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Are you generating the correct binary code? I am thinking a short constant instruction
might map or decode wrong.
That is a good thought. I am pretty sure the constants are correct. Structured values are used and can be dumped in SIM. IT shows for instance that ri.imm = 10. (register immediate mode instruction, immediate value). I may try a table lookup next to get past the use of constants and conditional logic. I may be able to fix things for this routine, but I would like to know what is amiss.

Added some hoops for the compiler to jump through. If the BNE instruction is a compare to zero and if the target address is estimated to be close enough then the compiler outputs a BNEZ instruction which is a shorter instruction.

I tried adding a loop in PutNybble() thinking there may be a forwarding issue with ‘n’. The results were the same. Now I am trying a different branch instruction (BNEZ in place of BNE).
Code:
 void PutNybble(int n)
{
   int m;

   n = n & 15;
   for (m = 0; m < 100; m++)
      ;
   if (n > 9)
      n = n + 'A' - 10;
   else
      n = n + '0';
   DBGDisplayChar(n);
}


A small hiccup: the register for the BNEZ / BEQZ instruction was not being encoded by the assembler.

After a fix, the results for BNEZ and BNE were the same. So, I have moved back to SIM. Stepping through PutNybble() in SIM works flawlessly. I have a feeling it is going to be a while before I figure this one out. It does not seem to be position dependent. Code has moved slightly with the same results.

Following is the assembler code output:

Code:

F00:0426       _PutNybble:
F00:0427         sub      sp,sp,96
               S01:00000720:  04 FE 1F F4
F00:0428         sto      fp,[sp]
               S01:00000724:  93 FC 1F 00 00 C0
F00:0429         mov      fp,sp
               S01:0000072A:  AA FC 1F F0 17 08
F00:0430         sub      sp,sp,8
               S01:00000730:  04 FE 1F FF
F00:0431         csrrd    t0,r0,12546
               S01:00000734:  0F 06 40 20 06 00
F00:0432         sto      t0,16[fp]
               S01:0000073A:  93 06 1F 02 00 C0
F00:0433         stos     s0,0[sp]
               S01:00000740:  95 96 1F C0
F00:0434         ldo      s0,96[fp]
               S01:00000744:  86 16 1F 0C 00 C0
F00:0435         and      s0,s0,15
               S01:0000074A:  08 96 E5 01
F00:0436       # if (n > 9)
F00:0437         slt      t0,s0,10
               S01:0000074E:  18 86 45 01
F00:0438         bnez     t0,.00101
               S01:00000752:  12 C0 01 E0
F00:0439       # n = n + 'A' - 10;
F00:0440         add      t1,s0,65
               S01:00000756:  04 88 25 08
F00:0441         sub      s0,t1,10
               S01:0000075A:  04 16 C2 FE
F00:0442         beqz     r0,.00102
               S01:0000075E:  10 20 00 E0
F00:0443       .00101:
F00:0444       # n = n + '0';
F00:0445         add      s0,s0,48
               S01:00000762:  04 96 05 06
F00:0446       .00102:
F00:0447       # DBGDisplayChar(n);
F00:0448         sub      sp,sp,8
               S01:00000766:  04 FE 1F FF
F00:0449         sto      s0,0[sp]
               S01:0000076A:  93 96 1F 00 00 C0
F00:0450         jsr      lk1,_DBGDisplayChar
               S01:00000770:  20 C2 70 00 00 00 [R]
F00:0451       .00100:
F00:0452         ldo      s0,0[sp]
               S01:00000776:  86 96 1F 00 00 C0
F00:0453         ldo      t0,16[fp]
               S01:0000077C:  86 06 1F 02 00 C0
F00:0454         csrrw    r0,t0,12546
               S01:00000782:  0F 80 41 20 06 02
F00:0455         mov      sp,fp
               S01:00000788:  AA 7E 1F F0 17 08
F00:0456         ldo      fp,[sp]
               S01:0000078E:  86 FC 1F 00 00 C0
F00:0457         add      sp,sp,104
               S01:00000794:  04 FE 1F 0D
F00:0458         ret   
               S01:00000798:  F2 02

_________________
Robert Finch http://www.finitron.ca


Wed Dec 15, 2021 6:25 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Busted something so that now it is back to the clear-screen stage. No character output. It is crashing on a zero return address. The latest change was for the compiler to use shorter forms of instructions for loads and stores. I tried backing out the changes, but the result is the same. Then I went through the change history for all the source files to try and identify the change that would cause things to stop working. I hit upon only one thing, the pipeline stall for a load operation was altered to remove excess pipeline stalls. So, I am changing it back to see if it makes a difference.

_________________
Robert Finch http://www.finitron.ca


Thu Dec 16, 2021 6:05 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Still stuck at the clear-screen. The link register is being zeroed out somehow.

Bypassing on Rc was incorrect. Writeback was bypassed before memory and it should have been the other way around. This did not affect very much as register Rc is not used very often, excepting for stores.
The link register was being updated during the EX stage for branch-and-link instructions. It should have been updated in the write-back (WB) stage. EX stage instructions that got invalidated would cause the link register value to be incorrect. This did not affect anything yet as only exception processing would affect the invalidation of the EX stage.

_________________
Robert Finch http://www.finitron.ca


Fri Dec 17, 2021 7:06 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Moved forward past clear-screen again.

Found a bug in the decoder. Register field Rc was sometimes indicated to be a vector register when it should not be. This led to the value of Rc being zero, causing the link register to get zeroed out. This affected mainly store operations. It was tricky to identify as it depended on the bit pattern of the next instruction.

Added move-to-link and move-from-link MTLK, MFLK register instructions. These instructions are four bytes shorter than using a CSR instruction to access a link register. Since the link register is accessed fairly often, it saves memory space.

With the aforementioned fixes character output is back!

Added PUSH or POP of one to three registers via micro-code. These instructions are more dense than separate load and store instructions. However, for simplicity the compiler does not make use of them.
Also added the ENTER routine instruction, which substitutes for 10 separate instructions.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 21, 2021 1:50 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the LEAVE instruction. LEAVE replaces eight instructions at subroutine exit.
The length for the SGTI instruction was wrong, it was 6 and should have been 4. Strangely this did not seem to impact much.
The register form of the SET instructions was not implemented.

_________________
Robert Finch http://www.finitron.ca


Wed Dec 22, 2021 6:55 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Tried outputting just a series of byte sized numbers using PutByte() to try and debug where PutNybble() is going amiss. Testing revealed that the second digit was always zero. So, upon studying the PutByte() function’s assembler code it is revealed that if a register variable s0 is somehow zeroed out then the output would be as seen.
Code:
F00:0503       _PutByte:
F00:0504         sub      sp,sp,96
               S01:00000770:  04 FE 1F F4
F00:0505         stos     fp,[sp]
               S01:00000774:  95 FC 1F C0
F00:0506         mov      fp,sp
               S01:00000778:  1D FC 1F 00
F00:0507         sub      sp,sp,8
               S01:0000077C:  04 FE 1F FF
F00:0508         mflk     lk1,t0
               S01:00000780:  5E 06
F00:0509         stos     t0,16[fp]
               S01:00000782:  95 06 1F C2
F00:0510         stos     s0,0[sp]
               S01:00000786:  95 96 1F C0
F00:0511         ldos     s0,96[fp]
               S01:0000078A:  87 16 1F CC
F00:0512       # PutNybble(n >> 4);
F00:0513         sub      sp,sp,8
               S01:0000078E:  04 FE 1F FF
F00:0514         sra      t0,s0,4
               S01:00000792:  02 86 85 10 00 84
F00:0515         stos     t0,0[sp]
               S01:00000798:  95 86 1F C0
F00:0516         jsr      lk1,_PutNybble
               S01:0000079C:  20 42 1C 00 00 00 [R]
F00:0517       # PutNybble(n);
F00:0518         sub      sp,sp,8
               S01:000007A2:  04 FE 1F FF
F00:0519         stos     s0,0[sp]
               S01:000007A6:  95 96 1F C0
F00:0520         jsr      lk1,_PutNybble
               S01:000007AA:  20 42 1C 00 00 00 [R]
F00:0521       .00122:
F00:0522         ldos     s0,0[sp]
               S01:000007B0:  87 96 1F C0
F00:0523         ldos     t0,16[fp]
               S01:000007B4:  87 06 1F C2
F00:0524         mtlk     lk1,t0
               S01:000007B8:  5F 06
F00:0525         mov      sp,fp
               S01:000007BA:  1D 7E 1F 00
F00:0526         ldos     fp,[sp]
               S01:000007BE:  87 FC 1F C0
F00:0527         add      sp,sp,104
               S01:000007C2:  04 FE 1F 0D
F00:0528         ret   
               S01:000007C6:  F2 02
F00:0529          .type   _PutByte,@function
F00:0530          .size   _PutByte,$-_PutByte

Going with the notion that s0 get zeroed out somehow, I modified PutBute() to avoid using s0 and use a second register variable instead. Testing this routine shows that PutByte() now works.
Code:
 void PutByte(int n)
{
   int m;

   m = n;
   PutNybble(n >> 4);
   PutNybble(m);
}

So s0 through the call of PutNybble() and DBGDisplayChar() somehow gets zeroed out. I have not figured out how that would be possible as s0 is saved and restored by all the functions that use register variables. There is a need to be able to dump s0 before and after a routine to see where it gets altered. I suppose I could try compiling without the use of register variables to see if things work.

_________________
Robert Finch http://www.finitron.ca


Thu Dec 23, 2021 4:51 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
First try at executing the ENTER instruction. It fails. The first six of ten micro-coded instructions are executed then address jumps to 0x08 for some reason. The only thing I can think of ATM is the branch target buffer is supplying the address for some reason. So, it is being turned off for now.
And, it turns out I missed encoding the vector indicator field in a couple of the load / store instructions. Since the instructions were not encoded correctly they attempted to access inaccessible memory causing an exception to occur.
The only piece that can fail with an exception in the ENTER instruction is one of the stores to the stack. Provided there is enough stack space there should not be any exceptions. I am tempted to add a dud store as the first instruction, which would cause an exception if there is no stack space. This would make the ENTER instruction restartable, but would add a memory access to every instance of ENTER. I think it is probably better to check stack space availability periodically.
Now I may add a verify memory operation that checks if the memory operation is possible without reading or writing to memory. This is almost the same as a LEA instruction.

Changed the encoding of the long immediate form instructions to make use of four extra bits for scalar instructions. For scalar instructions the vector mask register spec field is not needed, and the bits would otherwise be wasted.
The immediate constant was not decoded for the following instructions, causing the value zero to be used instead. CMPUIL,SLTUIL,SGTUIL,MULUIL. Impact was minimal as signed values are normally used.

Added the DEFCAT – default catch handler instruction.
DEFCAT reaches into the stack frame of the calling function to get the last registered catch handler address. It then moves this value to the return address fields of the return block so that the current function will return to the catch handler in the caller. The values in registers t0 and t1 are overwritten.

_________________
Robert Finch http://www.finitron.ca


Sat Dec 25, 2021 3:36 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Added the STH – store hexi-byte pair instruction which stores a pair of register as a 128-bit value. This requires only a single bus cycle for aligned data. To support STH register bypassing had to be expanded to 128-bits for the ‘C’ register port.
Added the STOO – store octa octet instruction which stores eight registers as a 512-bit value.
The STOO instruction is setup to store the equivalent of a cache line as a single instruction, but ATM is micro-coded as four STH instructions. The maximum amount of data that can be transferred by a store operation is 128-bits.
Added the STCTX instruction which stores all 64-registers to current context. It is coded as thirty-two STH instructions to avoid needing a two-level micro-code subroutine.
Added the LDOO – load octa octet instruction which loads a cache line into consecutive registers.
Added the LDCTX instruction which loads all 64-registers from the current context. It is micro-coded as eight LDOO instructions.
LDCTX and STCTX need a selector specified by a selector register, which points to the context save / restore area.

_________________
Robert Finch http://www.finitron.ca


Mon Dec 27, 2021 4:17 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The LEA, LEAX instructions were going to the memory queue then just returning the passed address. They did not need to go to the memory queue.
For register indirect addressing the register was not being encoded by the assembler in the instruction.
With this fix, exceptions are now showing a character in the top left corner of the screen when they occur.
The current issue is an exception due to a bad stack memory address during the ENTER instruction. The low order bits are correct but higher order bits seem to be zeroed out. This has me mystified at the moment as the ENTER instruction is invoked shortly after a LEAVE instruction. The LEAVE instruction accesses the memory in the same manner and it seems to be working.
Debugging is tedious work with system builds taking about ½ hour. Just enough time to watch most of a TV show.

_________________
Robert Finch http://www.finitron.ca


Tue Dec 28, 2021 3:52 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
The EXI instructions which extend constants needed to check if the EXI was valid in the pipeline. An EXI instruction after a branch was causing the first instruction after the branch to have an extended constant when it should not have. This led to a corrupt stack pointer and a memory exception.
Added a multi-precision CARRY instruction. The CARRY instruction indicates which of the following instructions have a carry-in or carry-out. It covers up to eight following instructions. This works with the ADD, and SLL, SRL instructions among others.

_________________
Robert Finch http://www.finitron.ca


Wed Dec 29, 2021 5:57 am
Profile WWW

Joined: Mon Oct 07, 2019 2:41 am
Posts: 593
Do you also and the zero flag , for add/sub with carry?
if (adc#sbc) ZF = (alu()==0) & ZF
else(ZF)=(alu()==0)


Wed Dec 29, 2021 11:29 pm
Profile

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Quote:
Do you also and the zero flag , for add/sub with carry?
if (adc#sbc) ZF = (alu()==0) & ZF
else(ZF)=(alu()==0)
No.
There are no flags really. To detect a multi-precision zero value all the the results register would need to be or'd together. There is a 3-input or so that would allow up to 192 bit results with zero detection in a single instruction.

Some of the conditions under which to stall a store operation were missing. This led to stores occurring that should not have. Basically a store stalls if there is an instruction before it that can change the program flow. The current issue appears to be a store operation that fails with a TLB miss due to a bad address. This is an operation that was previously working.

_________________
Robert Finch http://www.finitron.ca


Thu Dec 30, 2021 4:15 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
MULF made it back into the instruction set. The MUL instructions had the number of pipeline stages optimized to 18 to boost the clock rate. This made MULF more appealing as it has only 3 pipeline stages. It ends up being about 4 times faster than a full 64x64 multiply.

A pipeline stall needed to be added for the multiply and divide operations. Subsequent instructions were using results before they were ready resulting ultimately in an exception.

Pages were being mapped in the my_srand() and my_rand() functions. These functions were coded with register parameters, so need to be leaf routines. The page mapping was moved out of the routines to the mainline.

Modified the compiler to omit ENTER and LEAVE when the function uses only argument registers, which is the case for many small functions.

Added the keyword ‘bool’ to the compiler. Strangely, I had the logic to support bool values already present in the compiler but forgot to make ‘bool’ a keyword.

Made the register form of the SET instructions more powerful by using the third register field of the instruction. SET now moves zero to the result if the condition is false, otherwise Rc to the result. Previously SET only moved a one or zero. Rc may also be a small constant -32 to +32 so the original behaviour may be emulated using the constant one for Rc.

_________________
Robert Finch http://www.finitron.ca


Sat Jan 01, 2022 2:36 am
Profile WWW

Joined: Sat Feb 02, 2013 9:40 am
Posts: 2095
Location: Canada
Had to fix the compilers output for the hook operator. It was missing output in some cases.

Toying with the idea of removing most of the decrement and branch instructions. They use a fair amount of hardware. There is a lot of opcode space used and they are not that useful. For instance, the compiler does not make use of them.

I had high/low versions of the LLA instruction as the instruction may return an address larger than 64-bits in size. Got rid of the high version of the instructions by allowing the CARRY instruction access to the higher order bits.
Code:
CARRY C1,{O}{I}
LLA A0,ES:1040[GP]
ADD A1,R0,R0   # This will also add the carry over from LLA

_________________
Robert Finch http://www.finitron.ca


Tue Jan 04, 2022 3:44 am
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 775 posts ]  Go to page Previous  1 ... 35, 36, 37, 38, 39, 40, 41 ... 52  Next

Who is online

Users browsing this forum: No registered users and 23 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software