When I first set out with the Lidgren Network Framework and wrote the first lines of code for the multiplayer aspect of the game, I knew that I was delving into something more strange and complex than I ever had done before, even with the help of great technology.
Studying 1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond over and over it became clear that it would not be an easy task, but a rather daunting one. Even though technology has greatly improved since the time of Age of Empires, it was still a great struggle to single-handedly implement reliable multiplayer to a singleplayer RTS.
What must be understood is that the logic behind multiplayer differs greatly in a RTS compared to first-person shooters, platformers, and other games where players control one or very few entities.
Instead of sending data about every object in the world, which is not feasible for games with hundreds, or even thousands of interactive objects, only player input is sent. Each client will then have to rely on every other to make their calculations in the exact same order and way in order to perceive the same simulation. Now, this is not easily guaranteed.
It was clear that something was wrong when on one screen player one was destroying the base of player two, and on the other player two was fighting off the assault.
In debugging multiplayer and finding its issues I made a checksum that was calculated every 100 frames based on information of all active buildings and units in the game and then displayed on the screen. Using my stationary computer and laptop I could now play multiplayer matches and notice when the game desynced.
However, as you might know in a RTS many actions are calculated per second and it was not yet clear what caused the desync, only approximately when. From this stage it was obvious to start logging exactly what was happening. I easily integrated this with the replay-saving system already in place.
A log file could look something like this:
1321 (Train)(B 1)(U 1)
1348 (Order)(X 6459)(Y 5103)(X2 6458)(Y2 5112)(U 8)(B )(ATTACK False)
1392 Unit spawned at 6432 5032
1432 (Order)(X 6990)(Y 4953)(X2 6990)(Y2 4953)(U 8)(B )(ATTACK False)
1597 (PlaceBuilding)(X 107)(Y 77)(P 1)(U 8)(B 1)
1643 (Cancel)(B 1)(I 0)
1726 (Train)(B 1)(U 1)
1734 (Train)(B 1)(U 1)
1788 Unit spawned at 6432 5032
Saving a replay file on both machines and comparing them made it easier to discover when and why a desync had occurred. If the checksum at step 1600 was different, but the one on 1500 the same, it'd be a good start to investigate what really happens when a building is placed and how it could of altered the game differently on different clients.
Step by step
It was already obvious that it was not possible to play a whole match without players desyncing. Instead, now with debugging in place I continued by trying single actions one by one until it desynced. The first thing to be found was that some time after ordering units to chop some wood the game desynced. After a quick look at the code that revolved around this and nothing was found, it was tried on other resources than trees with surprisingly no desync issues.
The only noticeable difference between trees and other resources were that trees shaken in random strength and direction upon being gathered from, meaning that the position was slightly changed. Under approximately 20 frames the position was moved back to its original spawning position.
If a neighboring tree was destroyed while this was happening, the game would eventually desync. Because when a neighboring tree dies, a new interaction spotis created for the trees around it, and the position of the new spot is slightly altered to be in the direction of its owner position, in this case the surviving tree.
Now, that the owners position has randomly altered, the interaction spot would be different on different machines, because the random seed was not synced for this purely cosmetic operation. When a unit then went to gather the tree from this newly spawned interaction spot, it would go a different length on different machines and would eventually screw up the whole game.
If the gatherer would turn in its resources in just one frame later or sooner, perhaps a warrior would not be able to be recruited because of insufficient resources. Later on, this warrior will not be able to kill an enemy catapult that instead manages to get off that last shot to destroy a tower, and the simulation continues to drift apart until no resemblance between the two exists.
When found, the issue was solved easily by giving resources a center position and a position at which it was rendered. The shake effect would now only alter the rendered position, and all was in peace.
One of many problems
The issue with trees was just one out of many, and an example of what had to be handled in converting a singleplayer compatible mode to multiplayer. Among these there were also problems with optimization techniques not staying in sync and different regional settings on different machines giving different results resulting in desyncs. Had I written about everything, this had been a very long post. Expect more on this subject at a later point!