I have been reading Nate Silver's The Signal and the Noise and in one of his chapters he discusses weather forecasting and chaos theory. Chaos theory is where very small differences in the starting conditions of a dynamic system can result in completely different final results.
He writes about Edward Lorenz, the man who coined the term “butterfly effect”. When Lorenz was working on a weather forecasting program on a computer they were getting widely divergent results. This was when running through the exact same code using what they thought was the exact same data. According to Silver sometimes the forecast would be clear skies over Kansas and sometimes thunderstorms.
They spent quite a while trying to figure out what the problem was. Eventually they tracked it down to the setting of barometric pressure in one area of the grid where the floating point number was being rounded differently. A difference of just 0.0002 in the barometric pressures caused huge changes in the final results.
Edward Lorenz: “Chaos: When the present determines the future, but the approximate present does not approximately determine the future.” - quoted from here
A second thing that Silver writes about is a European weather model. In December 1999 they were trying to make a prediction of the storm Lothar for Germany and France. The model was completely deterministic. They ran many simulations, in some modifying the barometric pressure in Hanover slightly and in others making a tiny change in wind in Stuttgart. The results would sometimes show clear weather in Paris and in others a huge storm. The fifty different forecasts from this model are pictured below.
Now this discussion hit home for me because I've been working on Contraption Maker (CM) recently and had to deal with many of these same issues. Contraption Maker is a sand box physics game in the same vein as The Incredible Machine and is currently available on Steam. I've been working on it with Spotkin which is made up of my old Dynamix business partner Jeff Tunnell, his son Jonathon, and Keith Johnston. I was responsible for getting the physics right. To get a sense of the game, here is a user created perpetual motion machine.
When you have a contraption made up of possible hundreds of parts that are interacting with each other for hundreds or thousands of frames then the butterfly effect becomes very obvious. Move a tennis ball over by just 0.0001 units and it may bounce off a teeter-totter a fraction of a second later and then make something else bounce left instead of right and divergence is off to the races.
But as long as the initial starting positions of all the parts were the same then the contraption should always run exactly the same. This is where the floating point problem came into play. Contraption Maker is cross platform. It runs on Windows machines, Macs, mobile devices (very soon), and who knows what future devices. Is the CPU within all these different devices going to calculate floating point results exactly the same? One small difference messes everything up.
After some research online the answer I got was that if you set up things right then the answer was maybe “yes” and maybe “no”. There were also indications that some things like like sin, cos, sqrt, and others could be a problem.
I found this page which summaries a lot of what I found scattered across the Internet: Gafferongames.com
This wasn't the completely definitive answer that I was looking for, but I got the sense that if I set up the compiler settings correctly it would probably work. Probably... made me a little uneasy. So I wrote our own routines for sin, cos, etc so that I knew that they would give the same results no matter what computer/mobile-device CM was running on.
At this point everything was going along fine. Contraptions were running exactly the same on Windows and Macs. Development was cruising along. Parts were being implemented. Floating point didn't seem to be causing a divergence to occur. Some minor changes were needed to handle adding new parts to an already existing Contraption so that all the parts were still processed in the exact same order. A few minor bumps that were easily solved. And then the copy/paste problem reared its head.
The problem was that there would be a group of parts that did something neat and then you'd want to copy and paste that whole group of parts to another area in the world. With floating point you could not guarantee that they would run exactly the same. Floating point has that weird thing – the point floats. You have more resolution close to zero than you do farther out. So a group of parts close to the origin are not going to have the same floating point results as the same group farther away from the origin.
Argh, sigh... At this point the simplest solution was to just convert everything to be an integer physics engine. And do the same with the trig routines. I had had to do the same for The Incredible Machine because floating point just wasn't fast enough back then. So that is what I painfully did. Here is a group of parts that have been copied and pasted and are running the same.
Keith set up an automated determinism check where we have generated hash values for a few hundred contraptions that are calculated after each one has run for a thousand frames or so. This way we can just run “testcompat” and it then runs each of those contraptions and then compares their hash values with the saved hash values to verify that everything is running the same on all different machines.
So far so good - physics are matching as we start building mobile versions on new devices- and the hash value checks have also caught a few times where code changes made earlier contraptions run differently. Our goal was to never have an update make a preexisting contraption have a different result when run. The hash check let us find and fix these problems before an update was released.