The Drag[en]gine is a fully customizable game engine and game development environment designed with modularity and extensibility in mind not requiring expensive licenses.
A technical article about the new shadow code in the Drag[en]gine and especially the way to it for those interested.
Posted by Dragonlord on Jun 18th, 2009
An old wisdom states that the way is the goal. Nobody mentioned though what obstacles this way can have. Usually I do not write technical ( or like some people here call it „developer" ) articles but in this particular case I felt like this requires an exception. I went through quite some testing and playing around with different techniques to arrive at something which is a good ground to start with. Many of the results might also be interesting for others dealing with similar problems. So I wrapped this journey up in a little ( okay, a little large ) article. So let's start at the beginning.
Side note: All tests, results and images are done on an ATI Radeon 4870 512MB at a resolution of 1406x848. Target resolution is 1680x1050. The actual resolution is slightly less since the tests are made from inside the IGDE so some space is lost for the GUI controls. All images are preview size. Click on them for the full size. Some of them are 1406x848.
To understand the problem first a little trivia about the default OpenGL module in the Drag[en]gine. In general there are two rendering approaches: forward and deferred.
Forward simply renders each object with all lights influencing it in one ( or maybe multiple ) passes. This approach suffers from slowdowns and various lighting problems as soon as you try to make something complex or sophisticated. Many simple games use this method since it's faster to set up and works good for scenes of low complexity.
The second one deferred rendering ( or some called it deferred shading but this is misleading as one will see later on in this article ) uses a two pass approach. In the first pass all geometry is rendered to buffers containing position, normal, diffuse, specular and other informations. Then in the second pass all the lighting can be done. This separation of producing geometry and lighting it is the main strength of this technique. In particular it doesn't matter how complex your scene is as lighting is always in screen space like a post processing filter. The render time scales now linearly with the geometry and light complexity instead of the multiplication of the two. This is used usually by complex games. This is also the technique used by the Drag[en]gine.
Newcomers in engine development tend to see how nice this technique is by design and consider it the holy grail but the big awakening comes late. It's like comparing shooting with a pistol with shooting a rocket. The later one can do some big boom but if you don't aim properly you are toast instead. The main problem is that this technique promises lots of small lights at next to no cost and the benefit of using light volumes to reduce the lit pixels. But you made the bill without the evil guy.
The main problem with lighting in deferred rendering is not the lighting itself but the shadow casting. While you can do tons of lights in such a system what is going to kill you is shadow casting. And this is where the high quality engines set themselves apart from the cheaper engines. Nobody wants to use one dynamic light and the rest is more or less static.
A simple solution is to allow only spot lights. The advantage of this is that you can render just one shadow map and even scale it down at the distance to avoid wasting too much filtrate. But is lighting using spot lights enough? Unfortunately not. A scene with only spot lights is not only boring but causes a lot of additional problem. So we need point lights at one point or another. And while rendering spot lights is a problem with a relatively easy solution point lights are not. The main answer of engine designers in this case is „don't use them". People who know me know that I hate „don't use it". It's like using a cheat in a game when things get hard. So the goal had been clear: multiple point lights at reasonable speed at 1680x1050 resolution. But here comes the bad guy: shadows!
In the Thief series shadows had been your friends but for deferred rendering it's the arch enemy. Rendering points lights itself would be cheap but calculating shadows is anything else but cheap. A quick approach is to use a cube shadow map. Obviously the main problem is that you have to render 6 spot lights to form a cube map. This can not be cheap and indeed it is not. One dynamic shadow cube map works relatively well but as soon as you crank up the numbers of them things get ugly. And playing at 10 or less FPS is simply not acceptable. Besides the quality is poor as this early shot from the Epsylon game shows:
Not only slow but also rather pixelated. This shot features 3 point lights and the FPS rate had been already bad. So how can one achieve good framerates with tons of dynamic point lights? This requires quite some hacking around various edges. So it had been time to pack your sword to fight the monster... just how does one fight shadows?
One straight answer ( which also is upheld by some here ) is light maps. In the old times of game development anything dynamic in terms of lighting had been a pain in the rear. Computers had been simply too weak and accelerated graphic boards still a luxary. A bright mind had the idea to simply prerender lights into a texture of low resolution and to blend it using multi-texturing. The low resolution did a good job on filtering. Cool so is thia already the solution? After all most of the lights in a scene are static with simply turning them on or off. Again the bill has been made without the bad guy, the shadows.
Light maps have one huge disadvantage and this is that lights of various sources are grouped together. What is a bliss for speed is a nightmare once shadows are taken into consideration. If you have only one light source like the sun then this works out since you can use dynamic shadows and merge it with the shadow map. But why the hell do an extra waste of filtrate for a light map if you have generated a shadow map anyways? And if you have multiple lights you have no way of telling how to merge the dynamic shadows with the p recalculated shadows. In the Source Engine this works only because shadow casting is very fake. Hence shadows are nothing but a transparent overlay. This is obvious once two or more objects cast shadows on top of each other. Not only does one object not receive shadows but the shadow on the ground is doubled. If this is not a problem for your game all is well and you can go with this technique. But if you want to play with light and shadows and do things a bit more complex this is not an option at all. Besides deferred rendering hates light maps by nature. Getting them to work is a pain. So we need full dynamic shadow casting no matter what. But we have only a limited amount of filtrate and a nasty little Mini-Boss to fight with: incorrect lighting.
Lesson learned: Light maps worked in the past but they don't work anymore nowadays at least not if you look for dynamic lighting and complex lighting setups. In the worst case you trade fill rate of shadow maps with fill rate of creating light maps which is worse while loosing dynamic shadows altogether. If you run a deferred rendering system always go for pure dynamic lighting.
Using dynamic shadow casting there are two major problems: unnatural lighting and over-lighting. Take first a look at the sample images below.
In the left image a typical game scene is depicted with an engine using hard shadows. The resulting shadows are always black which is unnatural since in reality light bounces and lights up the shadows to some degree. The second image shows what happens if we try to cheat in some ambient term to light up the shadows. Suddenly parts of the scene is lit which should not be. This is a huge problem as this way a dark room next to a brightly lit one does not work. Neither solution pleased me for the Drag[en]gine so I went the hybrid way. The last image shows a combined shadow casting scenario where the scene geometry casts hard shadows preventing neighbor rooms from being lit improperly while still brightening up shadows of objects. Now this might sound like extra work since you need for this approach two shadow maps one way or the other. This is true but like always I have a plan and this plan was born using a test program.
Lesson learned: Ambient hacks are like the name suggests: hacks. They won't provide good results unless you work with only sunlight and no inside areas. It's worth investing time into a hybrid solution.
So the question arises, „any ways of getting rid of slow cube maps so we can use efficiently this two-stage approach?" To test this I made a special test program I filled with complex geometry and actors to test point light solutions. Some interesting result emerged.
Scaling the shadow map dropped framerate a lot for scene geometry but not so much for actors. In fact what happened is rather important for the upcoming optimization trick. Large triangles consume a lot of filtrate while rendering the shadow map as they touch a lot of pixels. Actors though composed of a lot of small triangles. Now recall the previous result of the combined shadow map with full shadow and penumbra shadow. Any way to combine that? In fact yes.
I split up the triangles assigning them to different shadow maps. In fact all large triangles went into a low resolution cube shadow map and all object triangles went into a high resolution cube map. The speed gain had been impressive. Although I rendered two shadow cube maps, one at 512 and one at 2048 I spend not more time then rendering one 512 shadow cube map while retaining high quality for where it matters: the actors. Still there is one more trick. Why render the shadow cube for the scene geometry any frame? After all it stays the same as terrain meshes in the Drag[en]gine are forced to be static. So now the costly process of rendering large triangles to a cube map went away since the low resolution cube map was rendered once for each light entering the view frustum and discarded once it left for a long enough time span.
Lesson learned: It makes sense to think sometimes out of the box. Hybrid solutions can be very powerful. Try to exploit the strength of one algorithm and compensate the weakness with another.
Going back to the Drag[en]gine with these interesting results improved the speed but not too much. 8 point lights with this setup on screen still dropped the framerate to 12 FPS while not doing yet smoothing on the shadows. That ain't gonna work. Cube maps dropped the frame rate brutally no matter how intelligent you use them. Hardware simply can't do them fast enough. We need something else.
Lesson learned: Cube shadow maps are no good solution for dynamic shadows. Avoid them whenever you can. No matter what tricks you play they are slow to create in real time and tapping into them is costly compared to a depth texture.
What about batching? So far each light has been rendered once like the deferred shading in the original version demands. Never listen to elders, that's a good advice to begin with. As outlined at the beginning the filtrate kills the deferred rendering implementation and lights are the prime filtrate killed. Especially the lousy idea of using a light volume helps nothing in the long run. So how to get things done faster? Batch it up! Modern graphic cards have up to 16 image units and quite a large number of uniforms you can use. So let's batch up to 8 lights in one shader run. The test scene with 8 lights should do the trick. All 8 lights could then be rendered in one go. The shader is a large behemoth but the results are impressive. Using batching you loose any chance of using a light volume but why using such a problematic structure if you can just reduce filtrate by factor 8 or 4? The results at least talk a clear language.
First I used a two pass approach where I rendered shadow contribution of 8 lights ( two runs with 4 lights each and 8 shadow maps each time ) into an intermediary texture. This became then the input to the light pass shader processing 8 lights at the same time. Using the SIMD nature of the GPU a lot of float calculations could be packed in a clever way to process them using one single SIMD instruction. The result is nice and allows to blur the shadow contribution texture before using it. But oh god, what happened here? Light bleeding all over the place.
Lesson learned: Never use a full screen blur to make soft shadows. Light bleeding results in light rims around all objects. It's cheap but looks crap.
So I went back and packed everything into one huge behemoth of a shader doing lighting and shadows of up to 4 lights in one go. I dropped the full screen blur since I had a different idea which might safe my butt in the long run so I could safe myself the filtrate of the blur for something better.
There exists an alternative solution going by the name: Dual Paraboloid Shadow Maps. The idea is simple. If you take a flexible square mirror and you push in the center you get a paraboloid. The surface is still a square just deformed. Using some math you can project a 180° scene onto a square depth texture. Not bad. You trade a cube map against two shadow maps. So is this the winner? After all the majority of point lights are 180° lights attached to the ceiling or a wall. The backside is not required so one shadow map should be enough. Unfortunately not but let's get this one rolled up from the beginning.
Paraboloid Shadow Maps suffer from a massive problem which turns them in practice unusable for generic scenes. The problem is the space bending. A line in space is turned into a curve on the shadow map. If the triangle edges are short enough this doesn't matter but if not things get ugly. Shadows bleed through walls and lights the other way. We have though two shadow groups one with large triangles and one with small triangles. For small triangles this should be no problem, right? And besides shadow maps can use hardware PCF out of the box. Sounds like a plan. And it is a plan.
Lesson learned: High shadow map resolutions can improve the quality at the cost of speed but often the quality gain is not that huge. 512 and 1024 shadow maps are nearly as fast while 2048 is slightly slower. 1024 is usually a good middle ground.
A test scene with a room with 8 point lights filled with tons of plants casting masked shadows and a dragon just for the sake of having one. No more cube shadow maps and soft shadows for free. The left image is with a shadow map size of 1024, the right one with 2048. In fact huge shadow maps are not required this way. At last the dragon likes also less.
Again shadow map size in the upper left corner. Personally I prefer the 512 shadow since it's the softest of all. Some people though like harder shadows. No problem since this option the player can set himself. A typical shadow quality option like most games nowadays have them. So have we now won? Unfortunately not.
While this works as expected for the dynamic objects this fails for the level geometry with the large triangles. Quick solution is to tessellate the scene geometry before sending it to the card. This way triangle edges are never too long to cause troubles. In general this works but the amount of triangles becomes quickly rather large. Not a problem though for the second shadow map for the large terrains since it is static and rendered only once the light enters the view frustum. Unfortunately the larger issue is now shadow edge artifacts. They are so ugly I won't show an image.
Lesson learned: Paraboloid Shadow Maps are a nice way to deal with ceiling and wall lights since they are 180° by definition. They work though only well for finely tesselated actors. World geometry has to be tesselated a lot which is not suitable for dynamic lights.
This situation stinks. The Paraboloid Shadow Maps would yield the speed we want. On the test setup using 8 lights with this setup and dynamic actors moving around the FPS rate is at 52. Now that's music eventually. Not yet the 60 I wanted to start with but it's close. Just the artifacts are a nuisance. This is when I played around a bit with the shaders doing something „out of the guts feeling". What if we crank up the PCF for the terrain shadow map using a bit of cheating? This should make things more smooth. So I went and gave it a try and applied a 25 tap PCF but instead of using 1 pixel steps I scaled the step size with the distance from the light source.
That's not that bad. Looks quite good for being a blatant hack out of luck. In fact without doing much I gained soft shadows widening away from the light source. It's not physically correct but looks not bad. A bit more testing with the new setup.
The soft shadows are well visible if the light source moves away. Soft shadows nearly for free. Costs a bit of speed and dropped down to roughly 47 FPS for the 8 light setup but that's a good start to begin with. That's when I went for the home run.
Lesson learned: Never underestimate a good hack. The most simple ideas can be the most interesting ones. Be not afraid of trying out the unsuspected.
So how about using the same hack also for the dynamic texture for actors? This should result in a similar soft shadow without eating too much speed.
Now that looks sweet doesn't it? Again the 512 shadow resolution would be in my favor since I prefer softness over hard shadows. This could though be too chosen by the player. The speed cost is though a bit more here but maybe there is a solution to this as well but more of this later.
This shadow odyssee went over quite some stops but the results are not to be sneezed at. To get an idea of how stressing shadows are on the GPU here a few figures from the tests I made:
8 lights, no shadows => 73 fps
8 lights, objects(shadow map), terrain(cube map), full screen blur => 40 fps
8 lights, objects(shadow map), terrain(shadow map), PCF only = > 52 fps
8 lights, objects(shadow map), terrain(shadow map), soft shadow = >30 fps
The PCF only approach is quite a fast one. The problem is the artifacts though. The soft shadow approach eats some speed but the results are interesting. The following images are made using the soft shadow version.
Here only the shadows without the objects casting them. In contrary to other game engines which have unnaturally hard shadows these here are pleasing to the eye. The faster PCF version give similar results but stinks for the terrain.
The same but with the objects.
The entire test room. You see here only a few objects since there is an occlusion trick used to remove objects which are clearly hidden. A fast test improving render speed. Obviously the player would never see the room like this so for all valid places the player can be the removal of objects is correct. The boxes are the 8 lights. The room is roughly 30-40m in length so all lights influence the entire screen. A tough nut for deferred rendering.
The above soft shadow technique uses unfortunately up to 25 taps into shadow maps. That's quite a large number and reducing this while not loosing too much of the soft shadow fun would be nice. One solution might be Variance Shadow Maps. The idea is relatively simple. Instead of storing the depth you also store the squared depth alongside. This requires a color texture instead of a depth texture which allows filtering impossible with depth textures. Using probabilistic calculations the shadows can be calculated. The good thing on this technique is that you can use a blur beforehand on the shadow map. For the static shadow map this might be a solution. This one is though to be tested. After all VSM suffers from light bleeding and the solutions to this problem are costly.
Another solution could be the nVidia PCF extension as used in Hellgate London. This technique though is costly and might not improve the speed.
After all though this is not so bad. Not every light source is required to also castdynamic shadows. All do cast static shadows but nobody is going to notice it if every second light doesn't cast a dynamic shadow.
So here we are now. Without the soft shadow extension 8 180° point lights with dynamic shadows on a large resolution is running at roughly 50FPS or with soft shadows at 30FPS. There's still room for some improvements but that's already a nice result for a couple of weeks hacking around. Hopefully this contained also a couple of ideas for others since point shadows are a problem for any game engine. Once the Drag[en]gine is out you can take a closer look at the code if you want.