At some point on the development of KROMAIA, we realized that graphics were our main bottleneck when trying to use hundreds of objects. Not Ogre3D’s fault, the shaders we use are quite “expensive” (and even more with all the objects we want to use). At that moment, we were updating everything every frame, if the game was running at 100 frames per second (FPS), physics and graphics were updated 100 times a second. But graphics (shaders mainly) were spending most of that time.
At some point on the development of KROMAIA, we realized that graphics were our main bottleneck when trying to use hundreds of objects. Not Ogre3D's fault, the shaders we use are quite "expensive" (and even more with all the objects we want to use). At that moment, we were updating everything every frame, if the game was running at 100 frames per second (FPS), physics and graphics were updated 100 times a second. But graphics (shaders mainly) were spending most of that time (more than 60% of the total frame time).
Note we will try to specify a calling order with the following images, never real time scale between different blocks. Control frames, including physical calculations, take less than a 10% of the total graphic time.
The first solution we tried (single threaded) was to limit the number of graphic updates per second and let physics update at a higher framerate. We limited graphic updates to 60 times per second as we thought it should be enough, but controls and physics could clearly benefit from a higher update frequency.
We could need some synchronization to make this work. We need to update physics whenever possible and look whether it is time to update graphics. If we are on a graphics update frame, we update physics (as always), synchronize positions and orientations (for example) with graphics (with Ogre scene graph) and update graphics afterwards to render the current state.
This way, we update graphics at a fixed framerate and physics can be updated much more often.
This solution could be enough in some cases, but we wanted more. We supposed that the CPU was waiting for the GPU to finish rendering to start the next physic frame (and we were right according to our results after implementation).
The problem is even worse if you plan to use Vsync, the frame execution will be stopped on the graphic update until the next frame can be rendered, blocking the full game when it could be updating control and physics.
Take into account that the execution is not stopped at the point specified on the diagram, but we think it should be easier to understand in relation to the rest of the diagrams. Anyway the effective behaviour is somewhat equivalent.
What are we looking for with multithreading?
This will happen with two processors or two cores, but with one core there are several improvements taking into account that, while rendering, the CPU is sleeping and, therefore, free for the control thread to use.
Vsync will work perfectly now, blocking just the graphics thread and leaving the control thread free to update the game logic as fast as possible.
So, which were the motivations we found to use multithreading?
- Graphics are the bottleneck
- Independent framerate between graphics and physics to make logic independent from graphics (and better yet: highest possible framerates for graphics and physics)
- Use of CPU while GPU is busy
- Use of more than one CPU on modern processors (just two with our current implementation)
If you have decided to give multithreading a try you will have to face some problems:
- Control thread can't change graphics
- Graphic calls must be done from the graphics thread
- Delay graphic function calls when call is forbidden
- Synchronization between threads
Think carefully about all this before going ahead, it can imply a lot of work. Are you sure the performance gain will pay off?
OK. Let's talk briefly about our implementation:
- Graphic Manager
We use a "GraphicManager" class as a wrapper of Ogre3D to hide Ogre3D calls to the rest of the game code. It is the responsible for the renderOneFrame function call and, because of that, where the multithreading implementation core exists.
- If multithreading is disabled the graphic part will be created in the constructor of the GraphicManager class and the renderOneFrame function will be called when needed on update calls.
- If multithreading is enabled, the constructor of the GraphicManager class will start a thread that will create the graphic part (graphics must be created on the thread that will render them) and loop forever waiting for its turn to start rendering calling renderOneFrame. The GraphicManager update function will not call renderOneFrame but notify the render thread of the next render starting time and will receive a message when the render has finished from the render thread (several control frames ahead, hopefully).
- Object synchronization
Physics can be updated as fast as possible, but positions and orientations are not needed on the graphic part until a frame is going to be rendered. That is why the setPosition and setOrientation calls can be delayed until the graphic frame is reached. The same can be applied to scale and opacity. Remember that setting a position or orientation to any scene node is forbidden while there is a render in progress.
- Delayed functions
There are some calls that will be "asynchronous" with the graphic part, that is, won't be made always on a graphics update, but will need to call to a graphics function. But calling some graphics functions is possible only if there is no render in progress, so you have two options: you can stop the thread calling to the graphic function until the render has finished, or you can add the function call to a queue of "pending function calls" with its parameters and continue executing. The first solution is easy, thread synchronization, but breaks the multithreading we have constructed; if that function calls happen quite often we will end with a multithreading implementation that behaves like a single threaded application. The second solution is not very complex: "store functions and their parameters and call them whenever possible in the same order the functions where queued" but you have to take into account that the function call can be delayed several control frames ahead, and "not calling" that function can have consequences you will have to foresee. For example, we delay the createEntity function, that means that when you create a new object in a control frame (imagine you fire a projectile) the graphic entity won't be created until a graphic frame is reached, but the object can move, collide (projectile impact) or disappear (or explode); all that behaviours must have a sense even without the entity being ready (think of material changes or fading).
Functions we are delaying include: entity creation, material functions, adding/removing child nodes and destroying billboardChains/ribbonTrails. There will be much more depending on what you use and how you use it. Oh, one more thing, we call every delayed function at the start of a graphic frame to get everything ready for the next update calls and avoid more delayed function calls along that frame.
After our first implementation we faced some problems with messages from the operating system (Windows). We couldn't fix the problem ourselves as we didn't know what was happening so we went to the Ogre Forums (LINK). As Xavier pointed out, the problem was the call to messagePump being done on the control thread instead of the graphics thread. Anyway, after several tests we found out that our game behaves better calling messagePump both, from the graphics thread AND the control thread, so it is being called from both threads from then without any problems (yet). We don't have much to explain here, but accept suggestions.
ALLOWED OR FORBIDDEN?
There are some things we know you shouldn't do while the renderOneFrame function is working (of course there should be much more forbidden functions, and even more depending on how, or when, you call some functions, use this just as an example of what kind of things you should avoid):
- Destroying nodes or changing parent or child nodes results in a crash. It seems not to happen when creating nodes, but we wouldn't recommend creating a new node while there is a render in progress.
- Attach to / detach from root node. Don't do this.
- CreateBillboardChain/destroyBillboardChain, createRibbonTrail/destroyRibbonTrail. The same than nodes.
- Changing position, orientation or scale of a scene node could crash or create other strange problems. We have no further information on that.
- Creating entities seems not to crash, but we had lots of trouble with creating/cloning materials for the entities we wanted to create. We decided to delay complete entity creation as it was just as complex for us to delay creating materials than to delay creating entities.
- Changing materials is not recommended. We are not sure if this is a direct cause for a crash, but there were some strange problems because of this.
Of course sync with the render thread before destroying the graphic part of your game. You can destroy things from the control thread, but make sure there is no render in progress.
We hope this description helps you understand what we have done and why. Feel free to ask whatever you like. We will update the post with your suggestions and questions. You can comment on the Ogre3D forums (Ogre3d.org) or contact us through e-mail (firstname.lastname@example.org) if you want too.