The Art of Efficient Culling news

Post news Report RSS The Art of Efficient Culling

This news post gives amongst others some insights into the art of efficient culling and how the Drag[en]gine deals with this situations.

Posted by Dragonlord on Oct 1st, 2010

What is Culling and why does it matter?

You can not build a high class game engine rendering wise without having to deal with culling in one way or the other. Although the computers of today are powerful they are still unable to render complex scenes at acceptable frame rate and will not be able to do so for quite some time. The limitations are actually much lower than one might expect. To cope with this problem Culling is used. In a nutshell culling values the old Graphic programmer wisdom „the fastest objects to render are those you don't render at all". As simple as it sounds this is one of the most tricky challenges in graphics programming. While for a human it is rather simply to judge if something is visible from his point of view it is an expensive operation to do so for a computer.

Different commercial engines developed different solutions for handling culling. Each technique though works mostly for either indoor scenes or outdoor scenes but no method works well for both. In the past game developers cheated around this problem by limiting their games to either outdoor or indoor and using various game mechanic hacks to hide the transition like for example entering a house or cave in Oblivion. Another problem are static and dynamic environments. Many methods only work on static environments but with games getting more dynamic new solutions are required.

In the course of the last couple of weeks I spend quite some time transporting over the old culling system from the Terrain system to the pure Component system and at the same time added quite some improvements. So here a little insight in the culling in the Drag[en]gine Game Engine which.

Frustum Culling

It is astonishing how many small projects forget the very basic idea of frustum culling. Yet it's one of the most important steps. What many don't know is that objects outside the view do reduce render speed a lot. Although the graphic card does prevent rendering them outside the view it has to do a lot of calculations to figure out that all faces of an object are actually invisible. So the first step is to build a view frustum and to test objects against it. A View Frustum is a pyramid representing the view of the camera with the camera at the tip of the pyramid looking down at the base of it. Rejecting objects by testing them against the frustum is nice but there is a catch. This test by itself is CPU hungry especially for triangles. The game engine therefore rejects entire objects by testing their boundary instead of each triangle. Even then though it is not fast if you have a complex scene. Another step is required. (frustum culling has been in the game engine since the very beginning).

Octrees

To the rescue comes octrees. An Octree is a rather interesting spatial data structure. An octree encloses a box shaped piece of space. This space is split up into 8 octants which each can be a smaller octree totally filling this space. As you can imagine this box-in-box structure grows like a tree splitting up your game world into nested boxes of various size. Objects are then inserted into the smallest box which entirely holds the object. An octree exposes two interesting properties. First if an octree box is invisible all objects in this box are invisible. The inverse is though not always true but that's not so important. The second property is that since every sub-octree is fully contained inside an octree octant if such an octree octant is invisible the entire octree below it is invisible. As you can imagine you can skip large chunks of your game world by testing the octree from the top down to the leaves. Actually determining the rough visibility using an octree is the most used solution and works fast and provides a good culling. (octree culling has been in the engine since the very beginning). Now is this the solution? Unfortunately not. Octrees can only help to figure out what is potentially in your view of view but it fails to figure out if objects are for example hidden behind a wall. But exactly these missed culling opportunities cost a large amount of speed. More solutions are required to get an acceptable speed.

Portal Systems

In early games Portal Systems have been invented. The basic idea is simple. You take your game world and you split it into convex rooms connected by walls and portals. Later on you can test your view frustum against this portal system to figure out which rooms you see. All objects in invisible rooms are then for sure invisible. Back then this technique worked well since game worlds had not many triangles. This had been the case because portal systems had been generated by a program (qvis for example). The walls of the game world are used to split the world into rooms. Obviously the more complex the world gets the higher the pre-processing time is going to be as well as the amount of data produced skyrockets. Conventional portal systems quickly hit their limits turning them unusable for a game with todays standards unless a world is artificially limited in complexity of developers forced to place manual portals. Another problem is that due to the convex nature of portals they only work well for indoor scenes. For outdoor scenes they are hell. Since the Drag[en]gine though is supposed to allow seamless transition between outdoor and indoor conventional portal systems had not been an option... „conventional" ones that is. Some might remember an old news-post where I talked about a portal system. I stepped the system up a bit.

So how can a portal system be actually turned usable for todays games? The solution is to get away from generated portal systems as well as using one portal system per game world. No matter how well your portal system generator is they can never produce optimal solutions since they lack an understanding of the map geometry the mapper does have. Let's take the ISG HQ building from the game. This building composes of 4 floors with two wings each one hosting 15+ rooms as well as a stairs area in the center where you can look down and up between all floors. This is a challenge for a portal system since the shape is highly irregular and holes provide problems with the convex nature of generators. To overcome these problems I went ahead and implemented a manual portal system object. What happens is that the mapper can now create alongside his building a portal system mesh in Blender.

I can hear now a lot scream „Help, this is complicate and time consuming!". You are right if you try to make a portal system as the ones generated by a generator application. The trick is though something else. If you look at the room above you'll notice that the portal system mesh is utterly crude. A room is more or less just a box. But exactly this crudeness is the clue. It doesn't matter if you have for example a room or hallway with tons of pillars or objects inside. All this does not change the basic visibility. Everything inside the room is in general visible. Also everything outside the rooms is in general invisible. For a mapper it is quite clear how the „logic visibility" of your map looks like as you tend to design rooms. All you have to do is placing a portal system wall in the middle of a wall segment no matter how this wall actually looks like. A crude approximation of the visibility is more than enough to fulfill the culling requirements. Another problem is the convex nature of portal system rooms. In the existing systems rooms have to be convex. This is tricky to handle manually and produces a large number of rooms and portals. To counter this problem portal system rooms in the Drag[en]gine are allowed to be concave. Using some clever math this relaxation of existing portal systems can be used to reduce the required complexity of portal system. Creating such a mesh takes no time for most cases. In this example the ISG HQ mesh contains over 40k triangles not counting door and various other props. The portal system mesh for the entire building as shown above ranges in at less than 3k faces. In most cases this mesh is even much simpler. A complex underground lab with more than 100k triangles can easily be befitted with a portal mesh of less than 2k faces. It takes therefore very little time to create this additional mesh but the speed gain is tremendous.

Here an example view of the ISG HQ. I am standing here in the hallway of the left wing (right on the image) on the first floor looking towards the stairs area and the right wing. The culling has been disabled so no portal system magic is done. As you can see the amount of objects in my view is tremendous. Lots of wasted work and slow frame rate.

The same situation as above but now the portal system culling is enabled. The amount of visible object dwindled a lot down to a number that can be easily rendered. The good thing here is that this kind of visibility detection works also for light shadow calculation reducing shadow rendering costs a lot. There is also another nice property of the portal system in the Drag[en]gine. You can use as many portal systems as you want in your maps. Typically though you use one for each major building. The engine is able to use multiple portal systems to produce a proper result. This allows the mapper to create the portal system and to change the mesh later on without having to worry about updating the portal system unless the visibility changes significantly. As a result the implemented portal system here runs at top speed without any pre-processing time required while being simply to produce by hand. But there is more possible.

Occluders

Objects outside portal systems are always visible. To deal with them occluders can be used. The idea of Occluders is rather old. In general this is a rectangular shape which blocks the view. They have been used in early games on landscapes to block the view of the player for example through mountains. In the Drag[en]gine occluders are available as an additional trick to cull objects. The mapper can place these shapes manually in the map if he wants to give a hint about visibility. The Drag[en]gine simply tests objects against these occluders. This is the last test which happens on the CPU side and can remove objects on landscapes for example behind a mountain or cliffs. The Drag[en]gine knows occluders since a couple of month. There is though a last trick that can be played out.

Occlusion Query

In the recent years a new technique arrived in the mass market of graphic cards, the „Occlusion Query". This is an OpenGL extension that allows to test how many pixels an object drawn on the screen affects. In short this counts the amount of object pixels that are in front of whatever is rendered already. This extension can be used to render a box (without actually rendering to the screen) around an object checking if any pixels would be visible. If invisible the query would return 0 as no pixel is in front of any existing pixel. While this might sound like the solution for culling it has a performance problem. It requires reading informations „from" the graphic card and this is slow to state it friendly. Using them though in moderate numbers and interleaving them smart with the rest of rendering they can be of benefit. While the CPU culling methods potentially miss these opportunities the Occlusion Query is able to detect them. Due to the CPU culling methods only a small number of objects has to be tested using a GPU Occlusion Query. In the Drag[en]gine these are lights and complex objects. They are costly enough to justify using an extra test as skipping them helps a lot.

Put it all together: The Drag[en]gine

All these methods are used now in the Drag[en]gine to provide efficient culling for both objects in your view as well as objects casting shadows for lights. Paired with the double-shadow approach of light sources as mentioned in an earlier news post the render speed has now been improved significantly. It is now possible to render a complex building lit by a sky light with 4-texture shadows as well as 10+ static and dynamic shadow casting lights in between 30-60 fps on a Radeon HD 4870 class hardware (fully running game). There is though still room for more optimizations. But that's to be left for another news post.

Miscelanous

Besides these rendering optimizations a bit of time had been left for other things too. One of them is the first implementation of the new texture property „reflectivity". This property defines how much a surface reflects the environment. These two screen shots give an example of how this looks like on the ISG HQ viewn by day and night. As you can see the reflection is dynamic and properly reflects the current sky configuration. The engine takes care of all so just adding this texture property to your skins is enough to get the effect.

Outlook

Getting these optimizations working like they do now had been on my agenda for quite some time. There's room for improvement but the results are promising. So stay tuned for the next time.