Last visit was: Sat Apr 26, 2025 4:49 am
|
It is currently Sat Apr 26, 2025 4:49 am
|
A virtual assembly language / Compiler back-end
Author |
Message |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
I'm developing a "virtual assembly language", for a virtual processor. My language is at a lower level than WebAssembly, but not nearly as low as real assembly language. It could also be called a language for a "compiler back-end", because that is the technology I'm using and I intend to use it as such. It is supposed to produce native assembly code but not to be portable to any native ABI: the ABI is also my own. The intention is first to use it for a WebAssembly runtime. Then to write a LLVM-backend that produces this language, and have a Linux or FreeBSD distro where all programs are compiled through it. For either of these two cases, compatibility with any existing ABI is not important. I want to be able to target as many 64-bit CPU instruction set architectures as possible ... as long as they are little-endian. I'm targeting primarily 64-bit variants RISC-V, ARM and x86. I have been trying to learn intricacies of as many instruction set architectures as possible, so as to not paint myself into a corner in any decision. (I have considered supporting big-endian, but it doesn't seem important, and I think code compiled to PPC emulating little-endian could run decently) I have still a very long way to go before I have anything to show, because of how complex it is and because I'm working on it alone. The design has been done for a while. I have just figured out many of the algorithms needed after studying modern compiler technology. Data flowThe actual language is similar to GIMPLE or LLVM-IR, in that it usesa variant of Static Single Assignment-form to represent data-flow: There is an infinite number of registers, but each register can be assigned only in one place. A label can take arguments, which become new variables defined at that place. This type of representation is common among modern compilers because it has certain characteristics that make compiler passes more manageable. It was also chosen because it is an abstraction around differences in register files on different targets. Ops are written in function-call syntax with parentheses "add(%a, %b) => %dst", to allow ops to be nested and avoid temporaries: "add(%a, mul(%c, %d))". Integer width is typically inferred but can also be specified with a dot-suffix to the mnemonic, like Motorola 680x0 and some RISC-V assembly to enforce type-checking: : "mul.w(%a, %b)" All integer instructions are available for 8,16,32 and 64-bit widths, but type does not encode whether a type is signed or unsigned: there are instead different ops. Integers have to be converted between different widths explicitly with ops that zero-extend, sign-extend or wrap. (To sign/zero-extend without changing type, use bitfield-extract ops.) A reason to have full support for small integer types is because I intend to support vector operations on the same types. On some targets, I expect a vector unit to be absent or that a vectorised loop will need scalar prologue and epilogue, so there needs to be support for small integers anyway. Control flowThe control flow is expressed in high-level `if-then-else`, `loop`, `break` and `continue` statements. So far only structured control flow with blocks is allowed: there is no "goto" but break and continues can span mutliple levels. That is similar to but more programmer-friendly than how control flow is expressed in WebAssembly: (My plan is to add gotos later when I have figured out how to compile it — but any goto will not be an optimised path.) Stack variables can also be allocated, per scope like in C, and have to be loaded and stored explicitly. There is also "alloca()" for variable-length stack allocations. If a block is marked "restore-stack" then allocations within it are automatically rescinded when exiting it. Stack variables are automatically initialised to zero. The language is therefore a bit of a hybrid between C and pure SSA-form. Both stack variables and SSA-variables (registers) are live only inside the scope they are defined. Unlike pure SSA, to pass a value out of a scope, you'd have to pass it by label-parameter or store it in memory (stack variable in outer scope or elsewhere). "if" takes a boolean, which can be a result from a `cmp_{eq|ne|lt|le|gt|ge|hi|hs|lo|ls}`. How this is compiled varies widely. (For instance, the RISC-V targets have six different representations of a boolean under the hood.) High-level opsI chose to have some special ops because of how they have to be implemented in different ways on different processors. Sometimes they are their own instructions, sometimes they are conditional instructions and sometimes they are using flags and sometimes using helped functions: Code: abs minu, mins, maxu, maxs ctz, clz, cpop byteswap, bitreverse select ; "conditional move" bfextu, bfexts, bfins
Integer division is quite a head-ache ... Signed division, right shift and average are divided into ops that floor the result and ops that round towards zero. In most ISAs a division rounds towards zero but a signed right shift rounds down (flooring). A right shift that instead round towards zero is often significantly faster but requires different types of instruction sequences for different target ISAs, so that became its own op. Also, many programming languages (such as Python) the division operator rounds down, and that can be done in multiple ways, so therefore that is also its own op. Code: divu, divz, divf remu, remz, remf lsr, asrz, asrf avg, avgz, avgr ; avg is the same for unsigned and signed-flooring. avgr always rounds up
Division by zero does not raise a fault: instead the result is zero with the remainder being the numerator, because that upholds mathematical relations and it is the behaviour on ARM. Similarly, division of MININT (0b1000... ) by -1 has the result MININT. Ops that do trap on both conditions are planned. Ops that trap only on division by zero by not on overflow will not be supported. (x86-64 target has a trap handler that inspects the machine code that traps. The RISC-V target has strange code-sequences that result in 0 instead of -1 on division by zero. For extra headache consider also that the above will have to be done four times for four integer widths, and that the code generator merges division and reminder ops with the same operands into the same ops for those targets that don't have separate ops) IdiomsSome instructions have been left out because they are trivially expressed as combinations of other instructions. The "instruction selection" in the compiler is designed to fold multiple ops into one when it is possible to do so without cost. For example "and(%a, not(%b))" could get compiled into an "andn" instruction if the target architecture has such an instruction. The same goes for instructions that are widening, such as multiplication or "load byte and sign-extend". The type of extension op tells whether this becomes a signed or unsigned instruction. Left and right shift ops do not mask the shift amount, but the amount is always treated as unsigned. That was first a very personal choice. If you'd write "lsr.l(%x, and.l(%amount #0x3f))" then a single instruction that does masks the amount is likely to be emitted. There are also some programming languages that do take shift amounts of the same width as the integer being shifted, and I thought it was better to make masked amount a special case than the other way around. Next post: The ABI.
Last edited by Findecanor on Sat Mar 29, 2025 10:17 am, edited 2 times in total.
|
Tue Sep 03, 2024 4:34 pm |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
A big project - looking forward to the next update
|
Wed Sep 04, 2024 9:48 am |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
The intention is that a program written in or compiled to the virtual assembly language should exhibit identical behaviour on all targets, even if it is completely bug-ridden, The various different organisations of architectural register files are abstracted by the use of SSA-form's infinite registers. However, when the code generator runs out of architectural registers to allocate, it has to make the generated code store ("spill") values to memory to reload ("fill") later when there is a free register. Normally that place is the local stack frame ... but, but ... you can't have stack frames be of different size and spills and fills to those slots could influence buggy (or malicious) code that would access that part of the stack frame. The solution: use two stacks. One hidden "safe stack" for spilling, and one "user stack" for stack allocations. The compiler can define spills to and reloads from the safe stack, and the program defines every other load and store. Now, if we have a separate stack that is protected, we can also use it to store sensitive data and pointers (such as e.g. return addresses) away from malicious code. Then stack overflows can only occur on the "user stack". This is not a new idea. The only new thing is perhaps using the idea for this purpose. Some CPU architectures have register windows, and some of these (excluding SPARC) save these to a separate stack. It has also been done in pure software ( "Code Pointer Integrity", Kuznetsov et al. 2014), and got implemented in LLVM. Compilation with this feature turned on by default on some parts of FreeBSD and Google Fuchsia. Overall, there is practically no performance penalty associated with this scheme (compared to the convention of storing both stack pointer and frame pointer on function entry): most of the observed differences are due to different cache locality: sometimes worse, other times better. However, when done completely in software, the security of the safe stack depends on the safe-stack pointer not leaking, and not being able to guess it too easily. I have decided to go all the way, and try to protect it as much as possible, because it creates a foundation for other things that can be done for security. First, the only place where the safe-stack pointer is stored is its register: the safe-stack does not even store any pointer to itself. Multiple threads are emulated by having multiple processes with the heap in shared memory, so as to keep their stacks private. All code in a program is created by the (now trusted) code-generator and ... it must be impossible to do an indirect jump/call to an arbitrary location. That's it for today. I'll write about handles the next time.
Last edited by Findecanor on Sat Mar 29, 2025 10:26 am, edited 2 times in total.
|
Wed Sep 04, 2024 8:58 pm |
|
 |
robfinch
Joined: Sat Feb 02, 2013 9:40 am Posts: 2302 Location: Canada
|
I am reminded of Java and Java virtual machine. Quote: This is not a new idea. The only new thing is perhaps the initial purpose. Some CPU architectures have register windows, and some of these (excluding SPARC) save these to a separate stack. M. Alsup's My66000 (comp.arch) has a safe stack. It is something I am trying to integrate into my own design. "Safe stack" seems like a reasonable idea. I have the safe stack pointer register not available to user mode software. It is a hidden register used to implement the ENTER and LEAVE functions. Using it for register spills and reloads is another idea to explore. If there are enough temporaries, register spills and loads are rare. Another thing that happens is all in use temp registers need to be saved and restored across function calls. Push multiple / pop multiple registers is handy for this.
_________________Robert Finch http://www.finitron.ca
|
Thu Sep 05, 2024 4:38 am |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
Quote: I have the safe stack pointer register not available to user mode software. It is a hidden register used to implement the ENTER and LEAVE functions. Using it for register spills and reloads is another idea to explore. Yet another thing to think about is how to unwind the stack after a longjmp() or an exception has been raised. Instead of storing a linked list of frame pointers to the stack in every function epilogue, I'm storing the user-stack pointer. (same register, but different stack). Thus only the safe stack gets unwound and that is done by code: an alternative return point at every call site. It is generated by the compiler and can be different for every target. Setjmp/longjmp works like an exception handler: jmp_buf contains a token, and the handler "rethrows" if there is no matching "setjmp" token in the code. One thing I'm considering to delve into in the future is to allow the stacks to grow on stack overflow. Because the user-stack frames are pointed-to from the safe-stack, they don't have to be contiguous but could be in different segments, allocated on demand. If the safe-stack overflows and there aren't enough free pages below it, it could just be remapped to a new more spacious location and only the stack pointer register would need to be updated.
|
Sun Sep 08, 2024 10:18 am |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
Handles
As mentioned above, a program is protected from running arbitrary code. Inductively, function pointers in registers (i.e. SSA-variables) are protected if only compiler-generated code is being run and the compilation system doesn't have a bug that has introduced a vulnerability.
Because function pointers in user-code visible memory (user-stack, heap) can be overwritten, handles are stored instead. They get verified when loading, and used to look up the raw function pointers.
Of course the drawback of this is that verifying and lookup up pointers from handles has a cost in code size and speed. Most of the size cost is bit-manipulation.
There are a couple different types of handles, each with a op: • Function pointer handle. get_fnptr (handle_ptr, #fn_type) => fnptr. • Virtual table handle. get_vtable(handle_ptr, parent_vtable) => vtable. A vtable contains verified function pointers, so lookup is just a load with constant offset. • Function table. get_fn(index, #fn_type) => fnptr.
The result of these ops are either a valid pointer or NULL. The pointer can be checked for NULL, and it would of course fault on use. The function type is a 32-bit integer that is symbolic in the language.
get_fnpt() is the most straightforward: function handle → function pointer. It uses a single large table. Each entry contains a raw pointer, a "generation" token, the function type and which "program fragment" (main program or library) the function belongs to. Each handle contains an index and token. After an entry is looked up by index, the token and function type are checked. It takes one load and one load-pair. The first load is just for bounds-checking the index.
A "vtable" contains "virtual functions", for languages such as C++ and Java. Every C++ object with any "virtual member function" has at least one vtable, which used to look up functions. When there is multiple inheritance, there are multiple vtables, but also immutable data fields in the table. A vtable's type is represented by its pointer. Vtables are themselves in a inheritance tree, and each vtable's path in the tree from the root, and the length of that path is known at compile time. A vtable's path is stored as vtable pointers in reverse in an array at negative offset from its vtable pointer. To validate a vtable handle, we first lookup a position in an array of pointers in which the vtable is stored Then verify that a parent vtable pointer is stored at the constant "path length" offset from that position. Every pointer in a vtable array is four-byte aligned, i.e. has bits 0,1 = 0. Every data field has bit 0=1. Again, this is also a load and a load-pair: one for bounds-checking, one for type-checking and one for testing "generation token". However, once we have a valid pointer to a vtable, loading a valid function pointer from the vtable is just one load with a constant offset and nothing else. Thus, the cost acquiring the vtable pointer gets amortised over every use. BTW. Rust also uses vtables, but Rust does not store a vtable pointer within an object but passes it alongside the object pointer in another register: Thus, Rust compiled to this virtual assembly language should avoid having to validate vtable handles altogether.
The last one is used for WebAssembly: Look up a function by index in the current fragment's function table, and verify the function type. It uses a range within the large table.
There are a few more issues with function pointer handles: • A library could be unloaded, and a new one loaded, reusing table positions. That is what the "generation token" is for: it is a counter per entry that gets incremented. • In C, comparison between a pointer and a handle must always return false, and sometimes the wrong pointer is used to load/store something. Therefore, every handle's bits must be form an invalid address. Bits in the top of the handle are always set to 1 so as to make the handle point a negative address and thus be invalid. • A NULL handle should also yield a NULL pointer. • The handle format must also be the same for all CPU targets. Some bits in the handle may be garbage, and some may be required by validation to have a constant value. If two handles on two platforms modified such bits in the same way, either both or neither must still be valid handles.
Last edited by Findecanor on Wed Jan 01, 2025 11:09 pm, edited 1 time in total.
|
Mon Sep 09, 2024 9:58 am |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
Sorry for the wall of text above. The complexity serves a purpose other than just code integrity, which I'll get to later. Fragments and the Global PointerCentral to the ABI is the concept of a "program fragment". (term borrowed from classic Mac OS, instead of repurposing the term "unit", "module" or "object" which could get confusing). Fragment number 0 is the program's code and const-data segments. Other fragments are shared libraries, dynamically loaded libraries (plug-ins) and JIT-compiled programs (on the roadmap, but far ahead). I'll think that 10 bits in the fragment number should suffice. The largest program I've measured used less than a tenth of 2**10 number of shared libraries. The "generation token" is used to allow fragment numbers (and thus table entries) to be reused after having unloaded a plug-in or JIT-compiled program. Libraries are "semi-pre-linked" with "direct binding" to numeric indices during code-generation. There is no runtime hash-table lookup in a global symbol table. When a library upgrade gets installed, it gets type-checked against he previous installed version and the same numeric indices reused. Code and const-data segments are memory-mapped execute-only and read-only from their binary files. They are position-independent, with all relocation done through tables. I think this scheme is much simpler than how Unix's shared libraries work, and it should not have worse performance either. There is a designated "Global Pointer" register. Most RISC processors' standard ABIs have one that is used to access the current function's global variables. On traditional CISC ABIs, it is more implicit, but some compilers materialise one in a register anyway from a PC-relative or fixed address. These days however, global variables have become unpopular, even considered a "code smell" by many programmers. They are of course still necessary, so I've added only one layer of indirection. Here, the GP register is used also to resolve handles, reach functions in other fragments and get the TP offset to the fragment's thread-local variables. It points to an entry in a 64K table on a 64K-aligned address. Each entry is 64 bytes (one cache line) and stores addresses to the fragment's various tables in the program. Thus, bits 15:6 of the GP is a fragment number, and the lower bits zero. Other ABIs do not support multiple programs in the same address space, or they have the overhead of passing a new global pointer (pointing in/directly to global variables) as a hidden parameter to every function call. Here instead, the local one is materialised on demand in only those functions that need it. The ideal would have been if there was hardware support for materialising the global pointer. Perhaps if there was a "fragment ID" in the page table, which could be read from a CSR. I think the designers of The Mill might have something, but when I asked specifically about it they answered "(patent) Not Yet Filed", which ... made me afraid that they might prevent others from designing CPUs with such a feature. Some modern processors support storing bits in the upper parts of a pointer, so on those we can store the fragment number in a function pointer. ARM calls it "Top Byte Ignore". AMD ("Upper Address Ignore") and Intel ("Linear Address Masking") have different variants that store 6, 7 or 15. RISC-V requires 7 bits in the RVA23 profile but may support 16. This is a complex mess of different standards, but not visible to the program. Either way, the fragment number is passed in a designated "function call scratch" register that can be used for other things once GP has been set. On x86, the "fragment table" could be in the lower 4GB of address space, and the GP modified by a 16-bit move after shift or rotate. On ARM, setting GP is two instructions. (two "EOR (shifted register)" instructions or a "UBFX; BFI"). On RISC-V there are three or four instructions depending on the extensions available. As an optimisation, each function could have two entry-points: the second for direct calls within a fragment after the GP has already been set by the caller. Oops. This post got lengthy as well. 
|
Thu Sep 12, 2024 5:49 am |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
Migration I've decided to design the system to support another big feature: "migration". The properties of the compiler and ABI I've mentioned in my previous posts, should, together with some compiler trickery, make it possible to hibernate a process (or thread) and reawaken it on another machine ... with another ISA! When I get around to it, the biggest challenge for full migration is not the compiler or ABI but about how to interface with the OS. If there are open files, they should remain open when the process is awoken.
There do exist a bunch of previous systems that do this (going back decades) but they have not got mainstream. One such recent example is Popcorn Linux. Some of these systems depend on binary translation from a reference program in one ISA to another. It should be much easier when we have a virtual ISA and formats designed for migration to begin with.
However, there are other uses for the migration feature that I think are.simpler to realise and more useful in a larger perspective. One of these is "partial-ISA" thread migration for when you have processors that run the same base ISA but with different extensions or properties. I think there's a need for software support for such systems. For example, a couple years ago, Intel had released a processor where the P-cores supported AVX-512 but the E-cores did not. This caused problems because operating systems expected all cores in a system to have the same feature set, so Intel sent out a firmware update that disabled AVX-512. (AMD countered by putting AVX-512 functionality on all cores of the next generation of Zen processors instead) There are similar problems on ARM where the newest standard requires scalable vector units but in-practice all units must have the same max vector-length. Heterogeneity is even bigger in the world of RISC-V SoCs where different cores implement different instruction set extensions. For example on the SpaceMiT K1/M1 only half the cores have support for matrix instructions. An upcoming RISC-V SoC Sophgo SG2380, has 16 fast cores (4-way OoO, 128-bit vector) and four slightly slower cores but with 512-bit vector length. The long-vector cores are intended for AI but would be sitting idling when you're not running that. There are also several small SoCs with both RISC-V and ARM cores, intended for running Linux on one and a RTOS on the other... I think that we are going to see even more partial-heterogeneity in the future even on larger server-chips, with e.g. some cores being more dedicated to I/O and other cores more dedicated towards compute. Not ISA-specific but on the same theme: There have already been research OSes that specialised the microkernel on each core for the applications on that core. IBM's latest Z-processor reconfigures the caches dynamically to adapt to processors where different cores have different workloads.
Other big uses for migration would be for debugging and profiling.
It is well-known that there are issues with debugging a "release build" that has undergone optimisation. For example, instructions could have been rescheduled into a different order of the statements in the source code and variables could be in registers or folded into instructions and omitted altogether. We could side-step that problem altogether if we could instead migrate an optimised "release-build" to a "debug-build" of the same program and attach a debugger to that.
Another use-case would be to migrate a slow-running program to a profiling build and have it collect data for a while. Then compile a new binary optimised using the output from profiling and migrate the same program to the new optimised build. ... all without having to restart the program and losing the data that it is working on. This is something that otherwise only JIT-compilers have been capable of doing.
Migration is easiest to accomplish if the compiler ("code generator") and runtime environment have been designed for it from the start, and I think it would be silly not to take advantage of that opportunity. I'm adopting a (in these systems) classic approach to designate "migration points" at call sites where the state of the registers and stack are well defined. Register usage typically follow a calling convention and control can be taken by overwriting return addresses on the (safe) stack. Calling conventions typically also reduce the number of used registers by designating a bunch of them as "temp" / "caller-saved".
A typical compiler divides a program into a "control-flow graph" of "basic blocks". Each basic block contains ops/instructions terminating with a control-flow op (conditional branch, or unconditional jump that joins, or closes a loop, etc.). One simple thing I've done that might be considered unconventional is to make each "call" also a block-terminating op. When ops are scheduled or folded (many-to-one) into an instruction, it is done only within a block, not over block boundaries: and thus not across calls.
A subset of the SSA-variables that are live after a call get designated the "Migration Variables" for that call site and thus get assigned spill-slots on the safe stack. The binary file contains metadata with "stack maps" for them. Unlike most regular compilers, this live-variable analysis and stackmap creation is done at an early stage while compilation is still target-agnostic. It tracks both when variables are live in Debug-builds and in Release-builds, with the former being a superset of the latter. One effect here that could add runtime overhead is that a migration point could bump a "Debug-live" variable to become "Release-live" because we might need to migrate from here to a debug-build where that variable is needed. To reduce the set, variables that could be materialised simply from other variables (or GP or TP) are first omitted: This is only done by tracing the ops in the program, not by any algebraic analysis.
Another issue is what to do with callee-saved registers: those would normally be compiled to be different sets on different targets. It would be tempting to just not have callee-saved registers but that would penalise the general case: some registers would be saved even when not needed to, and when needed, it would increase code size at the caller. I've decided to during a migration transform those into caller-saved variables: When the signal has come to (fully) hibernate a thread, we could change every return point to point to a "pre-migration" code snippet that only saves the callee-saved registers to the stack and returns. The last return point would lead to code that completes the hibernation. After migration, all return points point to "post-migration" return points that load the values into registers and then resume normally. The post-migration code will also materialise variables that are live but weren't included: the code found during analysis is copied straight here. Pre- and post-migration code could theoretically transform code pointers into handles and back but I think that is better done by special code, especially for code-size reasons.
Partial-ISA migration between cores that share memory involves changing function pointers in tables, and return addresses to point to special code that does the migration.
|
Sun Sep 15, 2024 9:49 am |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
I'm sorry, I have not had as much time or energy lately as I've wanted to work on this project lately.
I have been working on "liveness analysis": figuring out which SSA-variables are live at any point within a subroutine. This information is important for register allocation and also for instruction selection:
The main algorithms are pretty standard: dead-code elimination in SSA-form, followed by walking the subroutine in reverse order, from each SSA-variable's last use to its definition. The existence of migration points have given me the most headaches however. Those are visited during the reverse pass... The big issue is that they may cause a formerly debug-live release-dead variable to be reanimated into release-liveness. And that reanimation can reanimate the parameters of the op that defined it, and so on. If all that only affected code before a migration point, then that would be easy. But the migration point could be inside a loop, so the reanimation could also loop .. and be in code that has already been visited. So the code has to patch already visited state. I don't think that this will happen much in practice, but when it happens it must of course be correct.
|
Sun Nov 24, 2024 2:15 pm |
|
 |
Findecanor
Joined: Fri Jan 19, 2024 1:06 pm Posts: 15
|
I am still at the initial analysis passes: that primarily do liveness analysis. Other programming, work and things in life have taken too much of my time and mental energy.
The language has native 8-bit and 16-bit ops, but RISC-V and ARM don't.... except for some hybrid instructions that have built-in sign/zero-extension of one operand but not the other. Also, on modern x86, 8-bit and 16-bit ops can be more expensive than 32-bit, curiously enough. Therefore, the instruction selection pass at the very end will emit as many 32-bit ops as it can, and extend a register to correct the upper bits only when it matters. For each place where an extended register could be needed, the analysis passes will create a list of alternative places to put the extension. The instruction selector will later attempt to fold the extension into one instruction at those places if possible, and only if not insert an actual extension op.
The principle is otherwise that the compiler should not do heavy optimisation passes on its own. It should depend on the code coming in being fairly well optimised already, and then only de-optimise when necessary. However, with liveness is divided into DEBUG-live and RELEASE-live, there can be reason to optimise the latter.
I've however modified the first analysis pass to do "Sparse conditional constant propagation" (SCCP), because ... well, I adapted that algorithm for deciding when to sign/zero extend registers, so why not? It also does some simple "reduction in strength" (such as mul to shl) optimisations, and reductions to identity (e.g. mul(%a, #1) => %a), because those fit into this framework. This would minimise the the number of variables in migration sets to only those that are actually live.
It is followed by a pass that counts the ops that uses a SSA-variable as a parameter (with separate counters for DEBUG and RELEASE-builds). If the counter is 0, then the op that creates it is dead (in respective build). I've recently found a way to modify a classic algorithm for this to transform a conditional branch to an unconditional branch when both paths would have lead to the same destination. That optimisation will be only in RELEASE-builds though: when debugging a DEBUG build, you might want to put a breakpoint after a conditionally executed block that is otherwise empty, so those can't be removed.
|
Sat Mar 29, 2025 11:46 am |
|
 |
BigEd
Joined: Wed Jan 09, 2013 6:54 pm Posts: 1821
|
This sounds really impressive! Are you driving it with a collection of test cases?
|
Sun Mar 30, 2025 7:38 am |
|
Who is online |
Users browsing this forum: claudebot and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|