When you purchase a game or download a free mod from Desura (our digital distribution app), we want that download to come at max speed and with 100% reliability. On paper that sounds easy, however in theory it is a challenge without using Akamai or another high-priced CDN with many edge locations. To add to that challenge we pursue the Google model of using commodity servers and understanding that they will fail, so our system needs to automatically work around such failures.
After significant work and 3 failed attempts, I believe we have finally built a robust system and we've learnt along the way that you should never rebuild the wheel. Here's what we did:
First a custom protocol
In C++, we built from scratch an entirely custom transfer protocol to work with the Desura file format which only downloads parts of the file we need. This system works via a one socket, command and receive approach. The client asks for a file, an offset and download size and the server reads the file and regurgitates the data back to the client.
- PRO: Super fast, zero CPU
- CON: Worked fine internally when we only had the 4 team members using 6 servers. But as Desura started growing during the public beta, leaks started to form, sockets wouldn't timeout etc, and so many processes hung in a zombie state.
- CON: Very hard to debug sockets because on our test machines it always works, in production with tons of clients issues arise.
Next, we used a buzzword: NODE.js
So I started to investigate other options that could provide the back end server. About two weeks earlier, node.js was released and this seemed to scream at us and to be just the solution we were looking for. Node.js provided library and function calls we needed, we just had to build our custom transfer protocol in this language. In production it seemed to be much more reliable and stable than the old server program so we deployed it to all mirrors and called it a day.
- PRO: By using a tested codebase and not just OS calls, it worked well, was easy to deploy, portable and didn't need compiling.
- CON: We were trying to make node.js do something it wasn't meant to do, and again memory leaks crept into once many users started connecting.
Third time around we learned our mistake
Finally our server admin (Greg) suggested that we should just use a FTP server. After all the File Transfer Protocol has been around for 40 years and there are plenty of tried and tested server/client libraries to which to use. So we built our custom Desura file format to work using a FTP-like system and since then we haven't had a problem.
- PRO: FTP is built to send files, it has been in use and optimized to do this by millions of people. So it is reliable, fast, scalable and easy!
- CON: None
... and so this brings me to the point of this article. Never REBUILD THE WHEEL, especially when working with complex code (i.e. sockets) which needs to be 100% dependable from client to server. Building everything from scratch is nice, as it gives you control and the ability to optimize, however unless you have unlimited time, when existing libraries exist (especially tried and tested ones) - learn from our lesson and use them from the beginning.
Do you reinvent the wheel?