robfinch wrote:
Inventing more compiler goodies; today the ‘auto new’ construct. If the ‘new‘ keyword is preceded by the ‘auto’ keyword then the memory allocated is ‘automatically’ freed at the function return point.
Some earlier higher level languages such as java implemented what's is known as a "garbage collector" but I think that's not really what you mean to do here. I am aware of at least one more efficient way for a language to deal with memory management than garbage collection, and it’s equally forgiving for the programmer.
I can describe the case of the latest specification of the Objective C language, and the new Apple Swift language. They implement what is known as "Automatic Reference Counting". Manual reference counting is nothing new and has been used by programmers for decades. But its 'automated' form is surprisingly a very late addition to compilers. If I recall well,
automatic reference counting (ARC) was introduced for the first time on Objective C in 2011. Broadly, it works like this:
-The compiler only allocates (or creates) objects in the 'heap', never on the stack. The stack is only used to store intermediate scalar values and object references or to pass them among functions. This explicitly means that all objects are passed by reference, never by value. The later is explicitly forbidden by the language.
-All objects implicitly inherit a
reference count word, along with a runtime
type information field (also a word). The reference counter value (RC) is physically the second word on the object memory, and this is always the case.
-New objects can only be created through two very well defined mechanisms and not by other means.
*The first mechanism is the 'alloc' keyword. This creates the object in memory and assigns 1 to its RC. The object is uninitialised at this point and can be initialised any time later by calling its 'init' method. This is useful for example to create uninitialised objects that might be passed to a function for initialisation, or to create uninitialised graphs of data.
*The second mechanism is the 'new' keyword. This implicitly allocates and initialises the object in a single step. There are more explicit ways to 'allocate' and initialize objects which involve more programmer responsibility, but I will leave it out from my description for clarity reasons.
- Objects can be
assigned or
copied.
*
Assigning an object involves coping its reference but not its contents. This is conceptually like assigning a C pointer to a variable, but there's a crucial difference: the assigned object gets its RC incremented by 1. Incrementing RC is referred as
retaining. The assignee variable gets the pointer of the assigned object so it looses the reference to its previously pointed object, therefore its previously pointed object gets the RC decremented by 1 just before the pointer assignation actually takes place. Decrementing the RC is referred as
releasing an object. In the language, this is just a regular C type assign operator:
It implicitly involves releasing dstObj and retaining srcObj.
*
Copying an object involves creating a
new object that is a copy of the original one. The new object is physically a different one so it gets its own RC. When you copy an object the original one keeps its RC intact. The copied object is then assigned to a destination variable and the assign semantics described above are applied to the assignee object. In the language that's like writing this (the syntax is slightly different because you can implement hooks on the ‘copy’ behaviour, but you get the idea):
Code:
dstObj = copy srcObj;
- Every time that an object gets
released (its RC is decremented) to 0 (Zero), the object is automatically deallocated (deleted from memory). In
assignments this will happen when the assignee had a RC value of 1 before the assignation took place. It also happens when variables holding objects with RC of 1 get out of scope, for example if you create an object at the beginning of a function and you do not transfer it outside of the scope of the function, the object will be automatically deleted sometime before the end of the function. Possibly at the point where the variable gets its last use.
- As synonyms of
assigning we have passing objects to functions, returning objects from functions, putting objects to collections (such as arrays) and so on. These all are treated by the compiler the same exact way as assignations, and are provided the release/retain mechanism as appropriate.
That's basically it. By just having both the compiler adhering to the above rules and the language discouraging non standard ways to deal with objects (I said 'discouraging' because there are always workarounds for cases where you really want more control), memory management gets totally simplified and transparent to programmers.
The compiler furthermore optimises
retain/release code by removing retain/release pairs and folding retains/releases together in all cases where it's safe to do so. The result is even more efficient code. For example, if you use a temporary variable to hold an object, you have very high chances that retain/release code will not be applied at all for the involved object, thus the code generated will be identical to a simple C pointer assignation which may get compiled in just one or two machine instructions. The compiler will even remove ‘copy’ code and replace it by simple retain/release code if the source object was declared as immutable and it determines that the destination object will end its life unmodified.
It's an overwhelmingly simple, fast and efficient way to deal with memory, and yet it was not implemented on a mainstream compiler until 2011, which is quite surprising to me.
Joan