Second Antarean War (2aw) uses the lockstep model. This is an older model used in many RTS and early FPS games. Basic premise is that each client runs the game in parallel. The clients run the same commands so the game running on all the clients is the same. The server in a way does not run the game at all, all it does is act like a chat server sending messages from and to the clients although it has two other function - marking the end of a "turn" and making sure the game is in sync.
I opted out to use TCP for 2aw because I don't have to deal with UDP's loss and out of arriving order. 99% of the packets needed to arrive both in order and without loss for lockstep to work. TCP is only really slow for gaming if the Nagle's algorithm is not disabled. Nagle's algorithm is evil because it keeps a buffer and waits till it fills up - which makes very jerky real-time networking.
The simulation progresses in turns. The server decides when the turn ends which conveniently marks the end of the commands that will be executed that turn. These are not turns in the scene of a turn based game but more like steps or ticks.
The turns for 2aw are one second in length and are large for an RTS game. I did this in part to limit the effectiveness of bashing buttons to issue commands. Also i guess i can "simulate" more this way. At the end of each turn each client sends a MD5 hash of the current state to the server.
This is the second function the server performs - making sure all clients run in sync. When client start to send different hashes it marks the game "desynced" and collects some information about the state of sync and allows me to see what actually made it desync.
Test Driven Development helped much debugging sync as well as writing some great tools to dump and visualize desyncs. Here are some things I stumbled into: At first I was using python's hash function to derive some internal hashing in the game. That turned out to not work because some times the hashes are different between platforms. In fact some things hash is solely based on the memory address which breaks even on different runs of the program. After trying to come up with my own hashing methods (don't do this!) - I settled on a better approach - just dumping the client state into a buffer and computing the MD5 of it. That actually gives me 3 things at once! I get overly friendly representation of all objects as their __str__ methods are excellent and always relevant. I can dump the simulation state to a file and compare it to other simulations, with diff tools. And I get the MD5 - a nice hash of the thing.
As I develop 2aw desync are hard to debug. Because I target 2aw to run on Window, Linux and Mac different architectures and compilers contribute to some of the desyncs. The most common problem I run into is the float point errors. Different compiler optimize their float point paths differently so they produce slightly different results. I get around this by printing just the right amount of digits after the decimal point. Yet still some times a knife-edge calculation messes it up. A similar error for angles is that the angle is 0 on one client and 360 on another, just like some times you can get 0.00 and -0.00. I had to fix that too. One of my big worries is that the float points errors would be too much and I would have to implement the entire math stack as fixed floats (integers arithmetic masquerading as floats) - but so far its working.
The more subtle kind of errors you find when you iterate over the members of a hash map. Sense hashing on different platforms is different the order of iteration is different too. At first this might not sound like a big deal - but it is if you want all clients be 100% exact. Sooner or later you run into problem when this breaks and I had no clue why because the rift might happen couple of turns before and only manifest itself later. To fix this i resorted to sorting hash table and hash set iteration or do it differently some how.
Another interesting tidbit about lockstep is random numbers. Basically I use the pseudo random number generator with a seed like the Mersenne Twister. In theory each game should start with a different seed each time - but in practice it does not seem to matter so all games in 2aw start with the same seed. This is another reason ordering in hash tables and sets is important, so that each one gets the right random number.
Thats about it for 2aw lock step sync model. Do you have any questions or insights?