So I thought I was ready to post my latest blog entry on building our Custom 2D game engine, Part 2. But after coding things up and running some preliminary performance tests, I wasn't getting the kind of performance that I wanted. This became apparent when I finished the collision handling routine and was only able to get about 180 game objects on screen, moving and colliding with each other before my framerate dropped below 60 frames per second.
Now some of you may be thinking, um...what's wrong with only having 180 sprites on screen? That sounds like quite a few objects to be updating, collision checking, and drawing every single frame! Well, it is and it isn't. I felt like I should be able to get more objects, say around 400 on screen and still get 60fps. Call me crazy, but I wasn't satisfied.
So let the optimization begin!
Only Optimize What's Slow
So the first thing I needed to do was to get some stats on where the engine was spending the most time each frame so I could focus my attention on what was actually slow in the engine. That was easy using BlitzMax's Millisecs() function. I created a few global, two-dimensional arrays, gUpdateTime, gCollisionTime, and gDrawTime to help track the start and stop times for each of these sections of my main loop code. By storing the milliseconds past since the game was started right before calling my updateActors() function and then again after it ran, I had the necessary data to compute the elapsed time in milliseconds, gUpdateTime - gUpdateTime = elapsed update time.
After doing this, it became quickly apparent that the collision checking routine was the bottleneck, clocking in at 18ms with 200 objects on screen - not surprising since there is some pretty heavy math being calculated for the collision checking and response. My first thought was to go into the collision routine and begin optimizing the code there, replacing SINe, COSine, ATAN2, and SQR (square root) functions with look up tables to help speed up the computationally intensive routine that was being run on every object pair.
I'll spare you all the gory details, but after spending hours writing look up table functions and adding a few other optimizations, I was amazed to discover that the newly optimized collision routine was still clocking in at 18ms with 200 objects! WHAT!?! In a moment of disbelief, I commented out every line of code inside the main collision checking loop, basically reducing the collision routine to two nested loops walking through the actor list but doing nothing else...and guess what, still 18ms! And then a light when on inside my tired little brain...
Linked List vs. Array
Apparently a major chunk of time was being spent on simply traversing the linked list that contained all my game objects! In the collision routine I was traversing my gGameObjects linked list not once, but twice, to compare each object against every other object in the list and it appeared that this process was taking much longer than I would have imagined, having written and used linked lists in C years ago and found them to be a fast and efficient way to manage a dynamic collection of objects. But this is BlitzMax and while I've been quite impressed with its performance, I hadn't really made the kinds of demands of it that I was now doing for this commercial project, so a comparison was now called for. I decided then and there to archive my code and rewrite my object handling routines to use a simple array instead of the built in TList linked list functionality.
Within a few hours the rewrite was complete and I was ready to test my assumption that using a simple array instead of a linked list to hold all the game objects should improve the performance across the board. My assumption proved correct and with 200 objects running through the new collision routine it was clocking in at 4ms! Using an array was over 4x as fast as using a linked list! WOW! I had not expected that kind of disparity between the two methods, but I was certainly thrilled with the results! This was much closer to the kind of performance I was wanting from our engine and to think it had been found in the least likely of places - in our object handling methodology.
The New and Improved tGameObject Type
For those of you that are possibly following along and using the code I have posted thus far, you will need to update your code to reflect the changes that I have made in switching from linked lists to arrays for our game objects. Make the following changes to the tGameObject type...
Update the following Field entries:
Field link:Int Field parent:Int Field collidedWith:Int
and add the following field:
The lesson I learned in all this optimization over the past few days is to never assume that what appears as complex is really taking the most time. I was sure that all the math involved in detecting and resolving collisions was the culprit in my slowdown, but in fact, even running the non-look up table version of SIN, COS, ATAN2, and SQR was not where the slowdown was occurring. I also learned to be willing to challenge even my most sure assumptions (linked lists are fast, arrays are old-school) and that sometimes the most basic of tasks (walking through a list) can hide inefficiencies that with a little testing and out-of-box thinking, can be optimized to yield even better results.
Now that we have a better performing foundation, let's get busy putting this game together!