Thursday, March 4, 2010

Array-oriented programming

I have been working for quite some time now on Asteroid Hunter. This has not left me much time for exploration and experimentation with F#, but there is quite a bit a learned during the process anyway.

One of the biggest technical challenges of developing games for the Xbox platform using XNA Game Studio is probably how to keep garbage collection from eating too much performance.

The classical way which I think all developers are using is to preallocate your objects, and recycle them during gameplay. As a complement, one can also use structs instead of classes, as structs do not require garbage collection. The goal here is to never allocate new space, as every megabyte allocated triggers a new garbage collection. Depending on the number of objects the heap contains, a garbage collection can take from a few milliseconds to fractions of seconds, the latter being unacceptable for a game refreshing the screen 60 times per second.

There is another way, and I know no one who uses it. In this approach, it's ok to allocate a lot of memory space, as long as it's used by few objects. In other words, an array of value types, such as floats or structs, is cheap to allocate.

In a first version of the game, I used records to model asteroids, and stored these records in immutable arrays. This approach quickly showed its limits. I modified the code to use arrays of structs, also immutable. It was better, but this method incurred very high copying costs, as changing one field, for instance the position, required to copy the entire struct (composed of a couple more vectors for the speed, orientation, angular velocity...).

In the current version, I no longer have a datatype representing a single asteroid. Instead, I use multiple arrays of vectors, one array for the positions, one array for the speeds, etc. I rely heavily on higher order functions in the Array module, such as, Array.map2 and Array.mapi.

The code below illustrates the kind of programming style I am describing above. Take it with a nip of salt, it was written in quite a rush, and the Dream Build Play deadline left me little (as in "none at all") time for code cleanup.

type AsteroidArrays =
{ pos : Vector3[] ;
speed : Vector3[] ;
wspeed : Vector3[] ;
wpos : Quaternion[] ;
bsph : BoundingSphere[] ;
heat_level : float32[] ;
mass : float32[] ;
radius : float32[] ;
rank : int[] ; // -1 means dead.
heat_rate : float32[] ;
color : Vector3[] }

let integrate (V : Vector3) (dt : float32) (impulses : Pair list) (arr : AsteroidArrays) =
let speed = Array.copy arr.speed
for impulse in impulses do
let idx = impulse.fst
let impulse = impulse.snd
speed.[idx] <- speed.[idx] + impulse

let pos =
Array.map2 (fun pos speed -> pos + dt * speed |> WarpCoord.warp V) arr.pos speed

let rot = (fun wspeed -> new Quaternion(dt * wspeed, 0.0f)) arr.wspeed

let wpos =
Array.map2 (fun wpos rot ->
let x = wpos + Quaternion.Multiply(rot * wpos, 0.5f)

let bsph =
Array.map2 (fun pos radius -> new BoundingSphere(pos, radius)) arr.pos arr.radius

{ arr with
pos = pos ;
speed = speed ;
wpos = wpos ;
bsph = bsph }

When I dived into redesigning my asteroids from arrays of structs to a record of arrays, I was nervous I would never make it in time. When all you have is four days work, you don't want to spend two days on a redesign. Happily, it took less than four hours, which positively surprised me. The F# mode in Visual Studio may lack most of the refactoring tools available in C#, yet rewriting code goes a lot faster than I would have expected.

This immutable approach does not perform too badly compared to the mutable way, as all values typically need to be touched anyway. I haven't actually done any measurements, but I don't expect the immutable approach to be worse than twice as slow as in-place modifications.

I don't know if memory fragmentation could be a problem. The arrays in question are small enough that they are allocated on the regular heap (as opposed to the large object heap), and the remote performance monitor reports zero compactions. Moreover, all my arrays have fixed sizes, which should keep fragmentation at bay.

If you read some of my earlier posts in this blog, you might have guessed that I have my doubts about object-oriented programming. As far as performance-critical parts of the code are concerned, I am now convinced that one should stay away from objects, be it in the form of structs or classes. Whether one can get away with immutable arrays or not (as opposed to mutable arrays) probably depends on how much performance is needed.

That will be all for now, I am planning to write about concurrency and generic space-partitioning in my next posts.


TechNeilogy said...

Interesting. I used to use the parallel array style a lot in C++ where both large data structures and high performance were required. In some applications, you can get real performance gains in this way. This is particularly true where things can be done in "bulk" to the arrays (e.g. loading, saving, data-parallel operations, etc.) I mainly went away from it because it can be somewhat cumbersome to maintain and extend. But for high-performance environments, that would sometimes be worthwhile.

Eunice Zahn said...
This comment has been removed by a blog administrator.
Anonymous said...

I know for a fact that .NET would have to look up the field offset for any object being referenced in an or what not, but I wonder if OCaml had any optimizations?

You can imagine for instance that if the mass value of a 40 byte struct is bytes 8 through 11, could be optimized in such a way that it iterates over the array at an offset of 8 and a step of 40. That would make it nearly just as fast as the array implementation.

Too bad we couldn't possibly have such an optimization in .NET. Would have been really nice.