F# for game development: Reference-counting just isn't enough and actually is harmful

I recently came across a blog entry titled "Why Garbage Collection is not Necessary and actually Harmful".

I used to share the same opinion in my C++ days, when I used to think no better language could be written. A few years later, I have drastically changed my opinion on the subject. I am not going to address all points made against garbage collection, and instead will focus on one issue: The claim that reference-counting is a viable alternative.

Reference-counting is an automatic memory management system where memory allocated to an object is freed when the object is no longer referred to. C++ makes it relatively easy to replace pointers by wrappers that act like pointers, with the addition that reference counts in objects are automatically increased and decreased, as wrappers are created and destroyed.

Reference-counting by itself is flawed as it does not account for cycles. The aim of automatic memory management is to deallocate memory when it is no longer reachable from the program. Reference-counting deallocates memory when it is no longer reachable from reference-counted objects. These two coincide only in the absence of cycles (and provided a number of other conditions are met, e.g. all reference-counted objects are referenced through wrappers)

Edaqa says:

simple reference counting also covers the vast majority of memory management needs. Sweeping collection has a distinct feature of being able to resolve cyclic dependencies. That is when A points to B and B points to A. Reference counting cannot manage this in the general case.

Obviously if you have such a feature you will make use of it. I don’t see however that cyclic dependencies are a significant obstacle in most programs. Such object relationships can normally be resolved, or significantly mitigated, by one of a number of other techniques.

I strongly object to the claim that cyclic dependencies are not a significant obstacle in most programs. It may be so in version 1.0 of a program, but even that is stretching it. A reference-counting based memory recollection can introduce enough chaos into a software product that early retirement may be needed earlier than expected.

Potential cyclic dependencies are hard to detect.

It might seem at first that it should be possible and easy to detect cycles when looking at the class diagram. In practice, detecting loops involving more than 3 steps is in my experience tricky. Maybe not when you first design your classes, but when you or someone else alters it later. A developer who does not have the entire class diagram at his disposal or in his mind simply cannot know whether an addition can introduce cycles or not.

Actual cyclic dependencies are harder to detect.

Cycles at the class level are not necessarily problematic. There are moreover very common. Consider for instance the usual definition of lists:

template<typename T>
struct Node<T> {
   T data;
   struct Node<T>* next;
};

Although it is in theory possible to create a cyclic list, this should never happen in any run of the program. How does one convince oneself that this is indeed the case? The problem is solved for lists through the use of a safe API, but home-made more complex types seldom offer such an API.

There is no easy way to deal with cycles.

In order to avoid memory leaks, cycles must be broken in one way or another. This involves manually decreasing the count of an object, or simply not accounting for a reference.

The problem here is which link to choose in the cycle? Obviously, the link must be part of the cycle. Finding a candidate can be difficult by itself for the reasons mentioned above, and there is another restriction. The candidate link must never be a potential member of a cycle-free path for any run of your program, or you might end up deallocating objects that are still in use.

Update (May 2012): Jon Harrop in the comments point out there are languages that prevent creating cycles, and there are algorithms to detect and collect cycles. In this post I was mostly focused on C++ and the traditional ways of doing ref-counting, namely using shared_ptr.

Conclusion

The real problem with reference-counting and avoiding cycles is that there is no good way to convince oneself of the absence of bugs through static analysis or inspection of source code. Absence of cycles is a dynamic property of program runs, and is therefore difficult to analyse. Testing might help, but to my knowledge, few write and maintain test cases specifically for the purpose of validating reference counting.

What makes matters worse is that it's very easy to mistake reference-counting pointer wrappers for substitutes to classic pointers. Systematically using the wrappers often ends up introducing memory leaks. Fixing them introduces dangling pointers.

Mark-and-sweep is no silver bullet, but at least it's a bullet aimed in the right direction.

F# for game development

Friday, July 8, 2011

Reference-counting just isn't enough and actually is harmful

Posts by labels

Blog Archive