Wednesday, March 25, 2009

First obstacles and solutions

I've got forward with my asteroids clone on the XBox 360 using XNA, and I started noticing a few issues, not all technical.

First, a 3d space is mostly empty. I did not realize the probability of being hit by an asteroid is pretty low. In order to make the game somewhat interesting, I need a good 1k asteroids in a 100m x 100m x 100m cube. I doubt any player will have the patience and perseverance to shoot each of these 1000 asteroids. I have ideas about solving that problem, but I'm not sure which one to pick yet.

Regarding performance, the game initially ran fine with 100 asteroids, but as I say above, I will need 10 times that amount. The initial bottleneck was collision detection, which was quadratic. I solved this on the PC using space partitioning with an octree.


#light

open Microsoft.Xna.Framework
open System.Collections.Generic

type NodeData<'a> =
{ bbox : BoundingBox ;
sub_nodes : Node<'a> [] }

and Node<'a> =
| Leaf of BoundingBox * 'a []
| Node of NodeData<'a>


let mkBBox (p1 : Vector3) (p2 : Vector3) =
let x1, x2 =
if p1.X < p2.X then (p1.X, p2.X) else (p2.X, p1.X)
let y1, y2 =
if p1.Y < p2.Y then (p1.Y, p2.Y) else (p2.Y, p1.Y)
let z1, z2 =
if p1.Z < p2.Z then (p1.Z, p2.Z) else (p2.Z, p1.Z)

BoundingBox (Vector3 (x1, y1, z1), Vector3 (x2, y2, z2))


let map_parallel (func : 'a -> 'b) (items : 'a[]) =
let results = Array.zero_create (items.Length)
let count = ref items.Length
use notify = new System.Threading.AutoResetEvent(false)

items
|> Array.iteri (
fun i item ->
System.Threading.ThreadPool.QueueUserWorkItem (
fun _ ->
let res = func item
results.[i] <- res
let c = System.Threading.Interlocked.Decrement count
if c = 0 then notify.Set() |> ignore
) |> ignore
)
notify.WaitOne() |> ignore
results


let rec partition (count_limit) (intersect) (items : 'a []) (bbox : BoundingBox) multi_threaded =
if items.Length < count_limit
then
(bbox, items) |> Leaf
else
let in_bbox =
items
|> Array.filter (fun item -> intersect item bbox)

let center = 0.5f * (bbox.Min + bbox.Max)
let pts = bbox.GetCorners ()

let sub_nodes =
pts
|> (if multi_threaded then map_parallel else Array.map) (
fun pt ->
let sub_box = mkBBox center pt
partition count_limit intersect in_bbox sub_box false
)

{ bbox = bbox ;
sub_nodes = sub_nodes }
|> Node


let rec getVisible (view : BoundingFrustum) addVisible (partition : Node<'a>) =
match partition with
| Leaf (bbox, items) ->
if bbox.Intersects (view)
then
items |> Array.iter addVisible
| Node data ->
if data.bbox.Intersects (view)
then
data.sub_nodes
|> Array.iter (getVisible view addVisible)


It's pretty rough code, but there are some interesting bits anyway. In "partition", the function creating the octree, I use multithreading in the top call in an attempt to take advantage of the multiple cores. My timings on the PC show however little global benefit in using multithreading, but I don't know how much "partition" contributes to the overall execution time. It's too early to draw any conclusion, but at least performance is not negatively affected.

Note that I reimplemented map_parallel (which I had introduced in an earlier Tutorial about concurrency) without using asynchronous computations. It's obviously inferior to the older version as it does not handle exceptions, but it has the advantage of actually working on both the PC and the XBox360. The older version causes an "error code 4" on the XBox. I don't know why, I'll try to report that on hubfs with a smaller test case as soon as I get time.

The octree made the PC version run nicely with 1000 asteroids and more, but the XBox360 is still running very slowly, which makes me think that I have finally hit the limitations of its garbage collector. Note that I have not verified that, not being very comfortable with the development kit yet.

Currently, each asteroid is represented using a record, and records get translated to classes by the F# compiler. The records are stored in an array. I wonder if it would work better with structs instead, the idea being that an array of structs should be seen as one object, whereas an array of classes is composed of 1001 objects in this case. As the performance of garbage collection is dependent on the number of objects, but not so much on their size, an array of struct should offer better performance.

The sad thing is that I really liked records, I wish F# allowed the [<Struct>] attribute on records.