AnyCPU - View topic - Thor Core / FT64

Page 16 of 52

[ 775 posts ]

Go to page Previous 1 ... 13, 14, 15, 16, 17, 18, 19 ... 52 Next

Thor Core / FT64

Author	Message
MichaelM Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL	Re: Thor Core / FT64 I was going to suggest tagged memory a few days ago, but a lack of time prevented my posting. Two things that may or may not have already been considered in the use of tagged memory. First, the Burroughs B5xxx series of processors pioneered the concept. They used to tag the data type of the word in memory. One particular feature that stood out to me was they were able to tag the data type of a word so that only one arithmetic instruction opcode was necessary to support integer or floating point operations. In fact, mixed integer and floating point operations were supported by a single opcode / instructions and automatic conversions from one type to another were built in to the processor / ALU. Second, perhaps I've missed it, but I've not seen any mention of the new protected pointer types that the newer Intel architectures support. Apparently, they are not well supported by the compiler suites, but Intel has some new pointer types that operate similar to the old '286/'386 segment register descriptors. In other words, the pointer is represented in memory by more than one word, and the additional words contain additional information that the processor can use to implement more protections for such things as uninitialized pointers, object reference counting, etc. I only glanced / skimmed at this material several years ago, but it might be pertinent to what you are attempting to do. In addition, I saw the comment regarding the lack of a GC hardware accelerator. I think that you need to expand on this more. I think that a HW-accelerated GC should be the way to go, especially if you don't want the main core slowing down to perform this function on a regular basis. GC should be a background task handled by an asynchronous HW component of the core so that GC activities do not noticeably affect the processing of most tasks. _________________ Michael A.
Fri Aug 17, 2018 1:00 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Quote: First, the Burroughs B5xxx series of processors pioneered the concept. They used to tag the data type of the word in memory. One particular feature that stood out to me was they were able to tag the data type of a word so that only one arithmetic instruction opcode was necessary to support integer or floating point operations. In fact, mixed integer and floating point operations were supported by a single opcode / instructions and automatic conversions from one type to another were built in to the processor / ALU. Definitely a plus for tagged data. It could save a lot of opcodes. But generally this has been passed off to the compiler to know what data types to use. Quote: Second, perhaps I've missed it, but I've not seen any mention of the new protected pointer types that the newer Intel architectures support I had not heard of those. More to research ! I'm backtracking on the tagging. The problem with a dedicated pointer tag memory is every single store operation must be mirrored with an RMW operation to the tag memory. This cuts the performance of stores to about 1/3 what it otherwise would be. It also doubles the code size required for stores. Representing a pointer with a tag can be done in 64 bits. For sure there’s plenty of room in a 64-bit value. For my purposes I was thinking of setting the upper 24 bits to ‘PTR’ to indicate a pointer. That would allow 40-bit pointers to data. Conflicts with other numbers would then be reduced to 1 in 16 million odds. I suppose an exception could be generated if a value conflicted with a pointer. Another possibility is to use a floating-point NaN to indicate a pointer. This would prevent the FP from digesting a pointer and would allow non-conflicting floating-point values in memory. Ya know if the top 20 bits were set to 20’hFFF01 for instance to represent a pointer that would make it a NaN value and leave 44 bits for the pointer value. The only thing reduced is the range of negative numbers – there’s a hole in the range. The ALU still needs to calculate the pointer tag. But internal busses would not need to be modified. The IsPointer() function now looks like: Code: naked inline int IsPointer(register int *p) { __asm { asr $r1,$r18,#$44 cmp $r1,$r1,#$FFFFFFFFFFFFFF01 not $r1,$r1 } } The compiler’s broken: auto Incrementing/decrementing a value in a bitfield doesn’t work. The problem occurs when the field is dereferenced the value is placed in a register. The auto increment logic then says “aha” the value’s in a register so it just outputs “add reg, reg, #1” instead of putting the value back in the bitfield. Fixed this one. It takes about a dozen instructions to manipulate the bitfield, so the field in question was changed to a byte to increase performance. Pointer comparisons were using signed compares sometimes. More code needed to be added whenever a pointer type was set, to set the signed status to unsigned. _________________ Robert Finch http://www.finitron.ca
Sat Aug 18, 2018 3:22 am

MichaelM Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL	Re: Thor Core / FT64 There's a discussion of these pointer extensions on Wikipedia. It looks like not everyone is convinced that these enhancements are well thought out and are compatible with common C/C++ usage idioms. In fact, it may be that several competing software-based alternatives may be more effective and efficient. _________________ Michael A.
Sat Aug 18, 2018 6:11 pm

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Bounds Checking: The Intel bounds checking registers look similar to a set of bounds checking registers I had in an another cpu a few years ago. It could be I heard mention of them back then. IIRC the bounds registers worked with the CHK instruction. There’s still a set of bounds registers for the stack (SBL, SBU CSR’s) in FT64 which check all stack references to make sure they are in range. I have them effectively disabled by setting the bounds to the max range. Otherwise they are additional state to be managed on a thread switch. I think the idea in my case was to optimize the CHK instruction so that registers didn’t have to be loaded with bounds information while the program was running. Instead the bounds would be set at program start. Most of the bounds to check are statically determined, for instance is a pointer within the data segment? Hopefully this would reduce the need for registers in functions. Hardware GC: GC appears to be too complex to support with hardware. It looks like it works in phases for instance determining all roots before moving onto the next phase. One item I spotted for hardware support is the marking queue. I’m not sure it would help very much though. It could be a hardware queue. A simple store to an address to push an item, and a load to pop an item. The queue status (empty, full) at another location. A hardware queue might be able to trim several instructions off the push/pop operations. This is another spot where a generic hardware queue could be used in the system. I mentioned before the use of hardware queueing by the OS to handle the ready / waiting lists. Tonight, I’m contemplating adding to the CC64 language to provide compiler support for GC. One issue is that the GC scans entire objects to try and detect pointers. If only there were a way to send a command to the CG to tell it to skip over words while scanning. I was thinking of having the compiler output a special constant field in objects (structures) that indicates what to skip over. The constant would be a second NaN value that specifies the number of words to skip over. The problem is this field would have to be set at runtime. If the field wasn’t properly set the GC wouldn’t skip the following fields and hence might falsely identify some data as pointers. So, code would still work, just with memory leaks. Code: typedef struct _tagObject align(64) { int magic; int size; __gc_skip skip1 { __int32 typenum; __int32 id; __int8 state; // WHITE, GREY, or BLACK __int8 scavangeCount; __int8 owningMap; __int8 pad1; __int32 pad2; unsigned int usedInMap; }; struct _tagObject forwardingAddress; void (finalizer)(); } __object; ... P = new __object; P->skip1 = 3; // Tell the GC to skip 3 words Knowing what to skip over while GC’ing would help. _________________ Robert Finch http://www.finitron.ca
Sun Aug 19, 2018 2:43 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Added a couple of instructions to the instruction set: ISNULL and ISPTR. Isnull detects if a register contains a null pointer, which in many cases is not a simple zero. Isptr detects if a register contains a pointer. Both instructions are at least two words more compact than using the usual instruction mix. Just trying to get the compiler to recognize when it can use a conditional move instruction as in: Code: return (IsPointer(rg) ? rg : 0); _________________ Robert Finch http://www.finitron.ca
Mon Aug 20, 2018 3:08 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Got the compiler to do a conditional move for hook operations. The conditional move instruction is used about ten times out of 22,000 instructions. Been thinking of dropping it. The compiler now considers functions that only contain inline function to be leaf functions. Toying with the idea of adding a ‘findptr’ instruction. This instruction would scan a memory region until it finds a pointer value. It’s another function where it’d be nice to have dual result busses as it could return both the value found and an updated search pointer. It’ll have to settle for updating the search pointer. The value can be retrieved by doing a load at the search pointer. The original Thor had a small set of string instructions. It’s tempting to add several string instructions to FT64. An example using findptr: Code: static naked inline int FindPtr(register int start, register int end) { __asm { findptr $v0,$a0,$a1 } } static void GetGlobalRootsInDataArea() { int ptr, qtr; int endptr; int nn; ACB pACB; _SetDataLevel(OL_USER); pACB = (ACB )0; pACB->gc_rootcnt = 0; ptr = pACB->pData; endptr = pACB->pData+(pACB->pDataSize >> 3); do { nn = FindPtr(ptr,endptr); qtr = (int )ptr[nn]; if (IsPointer(qtr) { if (!IsNullPointer(qtr)) { if (*qtr==OBJ_MAGIC) { pACB->gc_roots[pACB->gc_rootcnt] = qtr; pACB->gc_rootcnt++; } } } ptr += nn + 1; } while (ptr <= endptr); _SetDataLevel(OL_MACHINE); } _________________ Robert Finch http://www.finitron.ca
Tue Aug 21, 2018 1:36 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 A command value for the garbage collector was defined. The NaN code $FFF02xxx… is now generated by the compiler which indicates to the garbage collector the number of words to skip over while scanning. It’s used in the data segment as the following example shows. Several structures and array variables were declared. Code: typedef struct { int a; } A; typedef struct { char s[50]; A a1[25]; A a2; A a3; int b; } B; typedef struct { int c; A pa; } C; char c1; __int8 i8; A gblA; B gblB = { "Hello world" }; int iarray[100][20]; B sarray[100][30]; C cvar; B s2array[100][30]; Code:* data align 8 bss align 2 public bss _c1: fill.b 2,0x00 endpublic align 2 public bss _i8: fill.b 1,0x00 endpublic data align 8 fill.b 5,0x00 bss align 8 align 8 dw $FFF0200000000002; GC_skip public bss _gblA: fill.b 8,0x00 endpublic data align 8 align 8 dw $FFF0200000000029; GC_skip public data _gblB: dc 72,101,108,108,111,32,119,111 dc 114,108,100,0 fill.b 76,0x00 fill.b 228,0x00 endpublic bss align 8 align 8 dw $FFF02000000007D0; GC_skip public bss _iarray: fill.b 16000,0x00 endpublic align 8 align 8 dw $FFF020000001E078; GC_skip public bss _sarray: fill.b 984000,0x00 endpublic align 8 align 8 ; GC_skip public bss _cvar: fill.b 16,0x00 endpublic align 8 align 8 dw $FFF020000001E078; GC_skip public bss _s2array: fill.b 984000,0x00 endpublic rodata align 16 ; global _i8 ; global _cvar ; global _s2array ; global _gblA ; global _gblB ; global _iarray ; global _sarray ; global _c1 Notable is the variable cvar can’t be skipped over because it contains a pointer. Also note a skip command is output only for structures and arrays. This may be improved in the future by grouping together primitive types to skip. _________________ Robert Finch http://www.finitron.ca
Wed Aug 22, 2018 2:06 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Compiler: Cleaned up some of the variable initializer code and added the capacity to initialize unions. Initializing a union looks at the constant value’s type and sees if there is a similar type in the union. If not an error occurs. Migrated some of the code from global functions to class members. Found an error in the compiler, the CSE table for an inline function was being deleted after the first use. If the function is inline the CSE table is no longer deleted -> memory leak. _________________ Robert Finch http://www.finitron.ca
Thu Aug 23, 2018 2:48 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 More Compiler: Modifications to the symbol class are in the works. The symbol class is used for all symbols the compiler can process. Symbols include local and global variables and functions and methods. An issue with the symbol class is that it holds fields for all the different types of symbols. There are a lot of fields related only to functions and methods. So a lot of space is wasted in the symbol table. It would be better if the additional function and method information were stored in a secondary class. There is only a single symbol table in the compiler. It supports up to 32,000 symbols in a lexical unit. There is unlikely to be anywhere near that number of functions / methods in a single unit (file).  Moved the additional information out to another class referenced from the symbol class. Set a limit of 3,000 functions in a single file. There was an error in the compiler where an indicator for a function pointer wasn’t being reset. This caused a dereferencing error sometimes. The error has been in the compiler a long time. Managed to fix several other compiler bugs. The assembler is being used to link all the source files together. It’s outputting thousands of lines of hex dumps without source code comments and I don’t know why yet. It’s supposed to be outputting a mixed listing. _________________ Robert Finch http://www.finitron.ca
Fri Aug 24, 2018 3:02 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Played with the idea of adding string instructions to the ISA. The string instructions would be interruptible, but one would have to test for completion of the operation after the string instruction, and loop back if not completed. Or, disable interrupts while the string instruction is executing. Code: .cont: strlen $v0,$a0 lc $v1,[$a0+$v0*2] bne $v1,$r0,.cont Also re-thinking dual result busses (because it’s a cool idea) which would allow: MOV2 move two registers to two targets (which can be used to implement XCHG). Does two move instructions in a single instruction POP (PUSH requires only one target) LINK UNLINK DEMUX DIVMOD (return both calculated quotient and remainder) MUL (return both high and low order products) Dual result busses could be limited to a single ALU, forcing instructions requiring them to be executed on that ALU. I previously removed dual busses from the design, but I had dual busses on both ALU’s. Increased the number of interrupt levels to 16 from 8. The plan is to use several levels exclusively for garbage collection. Since these levels can’t be used for anything else more levels were required. Not all interrupt levels need be used. Added code to the core to recognize cache preloads. A preload instruction is signified by the target register of a load being r0. For preloads errors are ignored. _________________ Robert Finch http://www.finitron.ca
Thu Aug 30, 2018 3:14 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 I noticed tonight that there’s a fair number of shift-left instructions followed by an add operation. This occurs typically during the calculation of memory addresses. With 48-bit instructions available I got to thinking that these operations could be combined into a single 48-bit opcode. The instruction would have to take two cycles to get through the ALU rather than cascading both operations in a single cycle. So there’s little if any performance benefit to using a combined instruction. However, it does consume less memory, so there may be some benefit for the cache. Performing a shift operation and a second operation with the same instruction reminds me of an ARM processor. A right shift followed by an ‘and’ operation can be used to do a bit field extract. _________________ Robert Finch http://www.finitron.ca
Fri Aug 31, 2018 3:20 am

BigEd Joined: Wed Jan 09, 2013 6:54 pm Posts: 1833	Re: Thor Core / FT64 They put rather a lot of chip area into their barrel shifter, so it must somehow have seemed worthwhile. It might be that it used up encoding bits which would otherwise be idle. As the relatively low instruction density of ARM is one of its defects, that might have been compelling. They particularly wanted a fast Basic interpreter, I think, and perhaps also a fast 6502 emulation, so that might have been a consideration. The nice thing about putting a shifter into your machine, and your ISA, is that a less aggressive implementation can take more time, while a more aggressive implementation can use more resources for greater throughput. It gives you a degree of freedom. Squashing two operations into one opcode maybe helps memory bandwidth, not just by lowering cache pressure but by freeing up cycles for write backs?
Fri Aug 31, 2018 6:29 am

MichaelM Joined: Wed Apr 24, 2013 9:40 pm Posts: 213 Location: Huntsville, AL	Re: Thor Core / FT64 Quote: The instruction would have to take two cycles to get through the ALU rather than cascading both operations in a single cycle. So there’s little if any performance benefit to using a combined instruction. I think that you are not crediting the reduction in memory bandwidth as a measure of improved performance. Somewhere recently I've read that "performance is measured in bandwidth". (Not a direct quote, but I think that it expresses a correct view regarding the source of performance. Reflecting further on the source, I think this perspective was expressed by Seymour Cray.) Reduction of the bandwidth requirements for cache / memory allows bandwidth to be preserved / transferred to other critical areas such as your register interfaces. It may also be possible to conditionally execute the other(s) instruction(s) contained in a packed instruction representation in any free pipelines that your architecture may implement. In other words, instead of serial instruction dispatch of the packed instructions through a single pipeline, perhaps a simpler, parallel pipeline could be used to perform simple operations like register-register moves. A pipeline of this sort would support many common operations that are performed in common algorithms. The address generation algorithm you describe could be one of those algorithms supported by such a configuration. In addition, many signal/image processing algorithms have a similar structure. _________________ Michael A.
Fri Aug 31, 2018 11:40 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Backtracked and removed the pointer logic in the arithmetic operations. The only arithmetic operations allowed on pointers by the compiler are +,-,++, and --. And there is no need for extra logic with those operations. Subtracting two pointers should naturally result in a non-pointer. The only thing to watch out for is overflow during an add causing the pointer status to change to non-pointer. I added an ‘addp’ instruction to add to a pointer and detect overflow. I noticed the conditional move instruction had a lot of extra unused bits in it. It has to be a 48 bit instruction. So I dedicated a couple of bits to specifying other conditional operations. I added an immediate mode to the instruction and a conditional add operation. Conditional add might be useful for a multiply function. There are a couple of more opcode slot available for other operations. Added a bunch of floating-point set instructions. I’m thinking of adding a new operator to the compiler ‘??’ (double question). This operator detects the orderliness relationship of two floating point numbers. Otherwise there isn’t really a way to test for orderliness at a high level. Code: If (a ?? b) printf(“Ordered”); Added run-time determined (rtd) operations. The operation to be performed is determined at run-time. I’ve previously called these mystery operations. Code: op = OP_PLUS; c = __rtop(op, a, b); op = OP_GT; if (__rtop(op,a,b)) printf(“hi”); I've swung back to the pointer tags using additional words of memory to represent the tag bits. I'm just working on the data cache. Virtual addresses multiplied by 33/32 to give an extra hidden word. This is 3% overhead. _________________ Robert Finch http://www.finitron.ca
Tue Sep 04, 2018 3:39 am

robfinch Joined: Sat Feb 02, 2013 9:40 am Posts: 2375 Location: Canada	Re: Thor Core / FT64 Quote: In other words, instead of serial instruction dispatch of the packed instructions through a single pipeline, perhaps a simpler, parallel pipeline could be used to perform simple operations like register-register moves. A pipeline of this sort would support many common operations that are performed in common algorithms. I was thinking along those lines, but there's very little (none at the moment) room in the FPGA I'm using. It would be better as a parallel pipeline because then the throughput isn't being affected, just the latency. Spent some time fixing up minor bugs and stubbing out some minor functionality in order to try synthesizing the rtl code. The code’s synthesizing right now, it could be a while. _________________ Robert Finch http://www.finitron.ca
Wed Sep 05, 2018 4:54 am

Page 16 of 52

[ 775 posts ]

Go to page Previous 1 ... 13, 14, 15, 16, 17, 18, 19 ... 52 Next

Thor Core / FT64

Who is online