Let's start with some background into the file formats of the game. Sonic Generations uses big packfiles for storing its assets, bb.cpk (around 2 GB), bb2.cpk (another 2 GB), and bb3.cpk (60 mb or so). This is a common practice for game developers nowadays, and most of them implement some sort of separate folder to override the files internally in case they need to modify the packs. It's a common technique used for delivering patches, or in the case of Steam, it only sends the modified chunks thanks to the infrastructure Valve has implemented.
Sonic Generations does not have any sort of file overriding method easily available. All the game's assets are in these big files. Luckily, reverse engineering these big packages for extracting/repacking wasn't a big hassle, given they are part of the Criware middleware, and the tools were already available. The custom Archive format for the packs that were inside these CPKs, ".AR", was already reverse engineered since Sonic Unleashed. Generations Archive Editor was made by MainMemory from Sonic Retro fairly quickly, so modifying the game's internal files was trivial with the right tools.
The biggest problem to this entire method was the repacking. Building 2 GB files every time you want to test a change in a game isn't efficient at all, so reducing the iteration time was needed. Until a solution appeared later on, this was the only method of testing changes. It slowed down any research a lot.
Back in late 2011, after I found that it was possible to modify this game thanks to Sonic Retro's resources, I wanted to experiment with the simple object layout format that was inside the game. Objects in Sonic Generations are just defined in a text file in XML, like this:
A single text file has multiple copies of these, one for every single gameplay object on the stage. All object types are pre-defined and hardcoded into the game. New type entries can't be added without modifying the game's code.
Since the task seemed fairly trivial, I started by building a program to parse these objects and visualize them in a level editor. The result was pretty quick, and it was easy to edit the objects of existing levels without much manual work. Making a simple level edit as a proof of concept was possible.
Attempting to import Windmill Isle Act 1
At the time I was developing this tool, TwilightZoney kept experimenting on his own with an interesting project. He was trying to import the first act of Windmill Isle from Sonic Unleashed into Sonic Generations, as the file formats were very similar between both games. He had already gotten into contact with different people about how these stages could be ripped into a regular 3D editor. At the time I'd had no idea the games were so similar and that such a thing was possible, so I offered help to see what I could do given my experience with the level layout format.
When I got the files to compare for myself, I noticed the AR packages were organized very differently between both games. Generations splits game assets and object layouts into separate packs. However, Unleashed had several of the assets needed for terrain in the file where the level layout would normally go. With some quick reorganization of the files going into the right places, in theory, Generations should be able to load the level without problems.
However, this is where the differences because of platforms (PC vs 360) started to cause problems. The game uses a big AR file renamed to Stage.pfd for each level in the game. This packfile includes all of the 3D models of the terrain in the stage, compressed with different algorithms in each platform to save up space and HDD/Disc Reading time. These models were saved in small Terrain Groups, which the game uses for streaming different chunks of the terrain in real time as Sonic progresses through the level. This all matches Sonic Team's presentation of the Hedgehog Engine at GDC very well, and I recommend giving the Sonic Retro page on it a look; they preserved most of the information from the event.
The same PFD also includes more than the terrain groups. It holds the baked Global Illumination lightmaps/shadowmaps that the game streams in real time. These maps were also joined together to form what is commonly known as a Texture Atlas for efficiency. I assume this is why the folders themselves were called "gia-####", which would stand for Global Illumination Atlas.
The compression method between both games changed because of the platform (360 to PC); the Xbox 360 version of Sonic Generations actually uses the same compression as Unleashed. This is why the levels of Sonic Generations can be easily backported to Sonic Unleashed with just a few modifications:
The compression difference is that the Xbox 360 versions of both games use the proprietary Xbox 360 compression. Luckily a tool to decompress these files was already available, so it didn't take much work to extract their contents. Upon closer examination of the PC version's files, I found it was using common CAB compression, so the tools for solving this were already provided by Microsoft in Windows.
However, just changing the compression format is not enough to get the level to load in Sonic Generations. There's a secondary file to the PFD called PFI. (From this point these files had to be cracked manually, so most of the stuff I will link to the Retro Wiki are my file specifications) The PFI is just a little header file that stores the addresses of each Terrain Group and Global Illumination Atlas, so the game can quickly refer to them and load it in real time without having to load the entire pack itself into memory.
Since compressing files with different algorithms tends to result in different sizes, these little header files needed to be fixed to match the new recompressed packfiles. Sure enough, with the file reorganization, the recompression, and the address fixing, the result was this:
Now there was another whole plethora of issues to fix...
No collision or animations available
Both Sonic Generations and Sonic Unleashed use the Havok Physics middleware to control the game's physics simulation and animation playback/blending. They provide software for common 3D editors, and they're a regular solution for most game developers these days. Havok also provides the possibility of exporting physics and animations into very complex binary packages, which are also optimized for the platform you're targeting. As a result, the files themselves only work for the platform they're shipped to, mostly due to Endianness differences.
In this specific case, the 360 files were Big Endian formatted, while the PC files were Little Endian. The PC version itself can't read these files by default, and the ability to read these files from any other platform than Win32 were limited to the full licenses of Havok. Being unable to get ahold of the full commercial suite meant that the files would either have to be cracked (which was very difficult considering how complex the format is) or recreated completely. For the sake of getting this to work, TwilightZoney and Chimera offered their help into recreating all physics and animations of these levels if necessary. The results of that is this video:
The collision itself could easily be recreated in 3DS Max because the Havok plugins are completely free and available to anyone who needs them. Since TwilightZoney was already in contact with Link, who had developed a stage ripper for Sonic Unleashed, this meant that converting those 3D models to collision would be possible. However, these custom collisions never ended up in the finished mod.
A visual mess
The other glaring issue was that the materials themselves didn't look correct at all. At the moment the cause of this was unknown, so I started some ugly workarounds in an attempt to fix it. The trick to getting these to match the original look better was to simply rotate every single texture by 90 degrees. This was also done for the GI Atlas maps, and the atlasinfo files themselves were fixed to match these new coordinates.
The result didn't look like it was supposed to, so this was never the proper solution at all. Eventually, all textures were restored to their original form later in development.
3D model reformatting
Finding out why the levels didn't look like they should could probably be solved by making a level importer myself. If I figured out how to properly import terrain and make it look like a regular Sonic Generations level, the root of the issue with the Unleashed levels would show up.
On this I got a LOT of help from darkspines35 and Link, since they had a lot of information on the model format already. However, there were a few incomplete aspects to it. Most importantly, how the vertices themselves were formatted in the file seemed to be controlled by a variable that said how big they were in bytes, and this value was very different in lots of models. Making a parser for this involved a lot of specific cases depending on the vertex size, and it got bit messy to handle.
Luckily, there was an undocumented part of the format that specified what the data in the chunks dedicated to the vertices meant. It was a binary table detailing the offsets of each element in the vertex. The elements were:
- UV Coordinates (Several channels)
- Bone Indices (Animated models)
- Bone Weights (Animated models)
- RGBA Vertex Color
With this new data it was possible to make an accurate parser for the models. It also meant that most of the model format was cracked, so I could make entirely new ones from scratch without having to modify actual files. It was just a matter of writing the software and finding an appropriate 3D Model format to use for exporting.
The discovery of the table above also meant that I could see what was wrong with the data from the Unleashed 360 levels when compared to the Generations levels on PC. As expected, the format used for the Normal, Tangent, and Binormal vectors was completely different. The PC version used 3 separate floats (12 bytes) to represent a vector with 3 coordinates, like these ones. However, the 360 version used a 4 byte value to represent each of these vectors, and there was no obvious way to convert them at first.
To figure out how to translate these 4-byte values into a normalized vector, I took advantage of the fact that there were other types of models with these kinds of values as well. The 360 version of Generations uses the same vertex format as Unleashed, and I knew what these values meant since the PC version used the exact same models, just formatted differently. Doing a cross-check between both files, and with lots of help from allegro.cc, I managed to get the conversion formula:
// "number" being the 4-byte value from the file
normal_x = ((number&0x00000400 ? -1 : 0) + (float)((number>>2)&0x0FF)/256.0f);
normal_y = ((number&0x00200000 ? -1 : 0) + (float)((number>>13)&0x0FF)/256.0f);
normal_z = ((number&0x80000000 ? -1 : 0) + (float)((number>>23)&0x0FF)/256.0f);
The other, simpler thing that was incorrect was the UV format. Generations on PC uses 2 floats (8 bytes in total) for each UV Channel, while Unleashed 360 used 2 half-floats (4 bytes in total). The parser for the latter format on the PC version of Generations was ported incorrectly, and it swapped the coordinates around when loaded. This was pretty easy to fix, and it ended up solving the problem in the game with the textures being rotated. It wasn't the file itself that was incorrect, but rather the parser in Generations PC for these formats was incomplete.
Now that the proper way to parse the model format was figured out, it was time to correct the Unleashed stages. By doing a mass reformatting of all the terrain models into the Generations PC vertex format, the levels finally displayed correctly. The GI finally displayed correctly, and most of the materials also worked much better.
Layout editor to full level editor
Eventually, due to the need to fix these formats, SonicGLvl also got the ability to do custom terrain importing from scratch. The following formats needed to be cracked and generated:
- .instanceinfo: This format was already mostly cracked thanks to Link doing the stage ripper. Each model that appeared on the stage could be instanced with these files, which determined their position, rotation, and scale in the world. This prevented similar models from being repeated in memory, leading to more efficiency.
- .material: Already mostly cracked by other people, but the format became better documented. It handles the textures, shaders, and material parameters that a model can use. The Unleashed .material format was an older version that used 3 types of files instead, so all the outdated ones got converted to the newer Generations format.
- .terrain-block.tbst: It stores the bounding spheres of each terrain instance in the level. Useful for detecting when to load/display them with quick references to the right instance via indices.
- .terrain-group: Defined the models and instances inside a chunk of terrain. It also had a general bounding sphere of the entire chunk, to detect when to load/display the entire terrain group. Very useful for streaming and unloading the terrain as Sonic progresses through the level.
- .terrain: A global database of the bounding spheres of each terrain group, along with its names. Also used for quick lookups on whether to load/unload certain terrain groups.
- .light-list: Just a simple list of the names of the lights used on the level.
- .light: Type/Position/Color/Radius of a light. The directional sun light is the most common, and it's mandatory to have one per level.
- .gi-texture-group-info: Database of GI Quality Levels, names, and ranges to detect whether to load a GI Atlas map or not.
- How to render the Global Illumination itself in the right format. Eventually I found that the alpha channel of the textures basically functioned as a shadowmap, while the color channels worked as a regular lightmap. With these things in mind, it was pretty easy to do a pipeline to generate them in the right format.
Once SonicGLvl became capable of generating all these formats on its own, making custom levels from scratch was possible:
With all these formats cracked and the model parser fixed, the ability to port an Unleashed level to Generations was mostly done by SonicGLvl itself. The steps to do this still are:
- Reorganize any terrain-related files (file types mentioned above) from the object layout file into the resources of the stage in the Packed folder.
- Fix the naming of the object layout files to match the Generations style (Base.set.xml would become setdata_base.set.xml). Merge the necessary object layout files as needed. This needed some special care, as there's numerous leftover XML files that were actually beta layouts. It's important to check on the main stage configuration file (Stage.xml) which object layout files are being loaded by the real stage.
- Repack the PFDs by decompressing the Xbox 360 encryption and recompressing them to the CAB format. This can be easily automated with batching files, and the PFD can be resaved with the regular Generations Archive Editor. Do this for both Stage.pfd, and Stage-Add.pfd (which comes with the Unleashed DLC) if available for high-quality lighting.
- Once everything's set into place, the game itself won't be able to load the stage just yet, but SonicGLvl can open it just like any regular Generations stage. There's no need to re-generate anything on the stage since the editor itself fixes all inconsistencies by just parsing the files into memory in one way, and outputting the files in a format compatible with Generations PC. A simple repack does the trick.
Korama from Sonic Retro made a very useful utility that was eventually used as the standard for modding anything in Sonic Generations. CPKREDIR was a new DLL that was able to intercept certain internal routines from Sonic Generations. In this case, what it did was intercept any requests to CPKs (The big packfiles mentioned at the start), and redirect them to open regular files that were available in a specified "mod" directory.
This allowed everyone to modify files in the game without having to repack the CPKs every time someone wanted to test a change. Reducing iteration time is vital to modding and reverse engineering, and this was a huge step towards being able to work properly. It also made distributing mods incredibly trivial, since you could upload the equivalent of a patch instead of very big CPKs.
CPKREDIR eventually got loads of new features like support for multiple mods, priority orders in loading, Steam savefile redirection, debugging file logs, and internal functions for dealing with the archive tree in memory to duplicate, move, and rename files. This tool is the common standard used in mods these days.
Havok reverse engineering
The other piece of the puzzle that didn't fit quite right was the Havok middleware. Recreating all the collision and the animation was turning out to be a far harder task than it seemed like at the time, and there was a lot of inconsistencies thanks to that. Since the rigid bodies didn't have the same physics flags as in the original game, some slopes didn't attach Sonic correctly to them, others didn't have the right friction, stuff like stairs weren't simple to do and needed to be replaced with better geometry... It was a hacky mess.
I had gotten in contact with someone previously that attempted to convert the 360 HKX files to be compatible with PC by swapping the endianness in certain places, but without much success. Eventually I decided to look into it, and spent an entire weekend carefully reconstructing the format.
The Havok format is object-oriented in the same way an OOP language is. There's classes that hold defined values, and they can also inherit their parameters from other classes. These classes describe the attributes of anything that was used in the HKX Binary file. There were objects for Rigid Bodies, Animations, Skeletons, etc. All of these definitions are already included on Havok itself by default, but for compatibility reasons between Havok versions, it's possible to include an entire binary section in the HKX called Metadata. This metadata defined in detail what kind of attributes each class had, their inheritance, and the keynames necessary for parsing. If it wasn't for this metadata they left on the files, everything would've been impossible to convert.
Once I was able to figure out the general format of how this metadata worked and which addresses pointed to what, I could finally get an automatic process that swapped the endianness correctly for each object perfectly. There was about 33 variable types that needed to be parsed correctly, which I figured out what they were by looking at the Havok SDK headers in the free version.
Once the tool was done swapping the endianness, I used the free Havok tool assetcc2.exe to update these converted files into the newer version of Havok that Sonic Generations uses. (Havok-5.5.0-r1 to hk_2010.2.0-r1) After this conversion method got a few quirks fixed here and there, it was able to import the exact Physics, Animations, and Skeletons straight from Sonic Unleashed:
This was pretty much the final piece of the puzzle in trying to get an accurate overall conversion of a level.
I've been told there's a more efficient method of handling this, with redirecting certain functions from the Sonic Generations .exe itself into opening the Havok 360 files and resaving them in the proper format. Since the version featured in Sonic Generations itself is the full commercial license of Havok, it has the functions for handling these conversions tucked away somewhere else; they're just not used in the regular game. If such a method is possible, then there's not much point to further developing my own converter anymore. Therefore, I'm releasing the source code and the compiled version of the HKX 360 to PC converter for files with metadata on them.
Download for HKXConverter. The source for it is in a messy state given its quick development, but it's fully functional.
I should mention that very early on development, JoeTE converted all of the level songs from Unleashed into the proper AAX format for the Generations levels. His work from back then is still featured in the final version of the mod. Music modding is a well-researched topic already with a set of tools available, which can be found here.
Proper organization for development
Now that most of the early research was done there was finally a general idea of how much the game could be pushed to work, but the state of the mod itself was incredibly messy at the time. The level order was much different, there were some half-baked object layouts designed around our old custom collision and some materials were changed to accomodate for the old bugs. More importantly, we didn't even know yet if the thing would even work on most hardware just yet.
To solve this, the new goal was to start from scratch and make a polished demo of a simple stage, such as WI1, but with all the new knowledge and tools. After roughly a week of work, the first demo was ready for release to the public, which you can still download here. Changes to the object layout to get it to work are further explained in the second article.
The test went well on most hardware except for some material bugs that could be easily fixed, and some stability issues related to CPKREDIR. Korama eventually polished the program further, and added new features as necessary. With confirmation that doing this was possible, everything could finally get organized properly.
Good organization is important
Around this time, the Sonic Hacking Contest 2012 popped up, so we thought it would be a good idea to think of it as a deadline to develop an even better demonstration. Windmill Isle 1 was an easy act to properly port, but a full stage would be a real test of seeing how much could be recreated. Having a proper time limit was also a good motivator to manage the work efficiently. At this point the bigger problems such as terrain or collision were pretty much fixed. What was needed was fine-tuning the object layouts as much as possible to fix any inconsistencies from one game to the other. The goal was that it needed to play like a legitimate stage, not just a straight port and ignoring anything that didn't happen to be compatible. Like with WI1, all these changes are explained in the second article.
We set a pretty good workspace for doing this efficiently, and I recommend anyone collaborating as a team like this to do the following:
- The most obvious one is using an automatic file syncing service, with support for backups and automatic revisions. Common repository methods can work (git), but stuff like Dropbox and Google Drive also work very well without needing much technical knowledge. Don't even think of sending files manually through IM/e-mails constantly if you want to work efficiently.
- Organize your smaller packfiles (.AR in this case) to be unpacked in different folders. It's easier to keep up with modifying folders like these rather than syncing entire packfiles and not knowing what was even changed inside of it. When you need to test out the changes, just repack from these folders manually to ensure everything's up to date.
- Synchronize your workspace properly and let everyone else know when you're working on something. When dealing with files that can't be easily modified with partial changes, it's important to let others know whether or not they should modify them.
- Stay in contact with everyone that works on the project and make sure they're available at regular times. Someone stalling the project by disappearing entire months without doing their job should probably be replaced instantly. The person managing the project should be constantly involved in what everyone's doing and making sure it all fits. Ordering people to do a task and checking back at a later date is not efficient.
- The pipeline for introducing assets into the game should be available to everyone on the team. Don't rely on one of your team members for testing changes in-game every time. Iteration times need to be reduced as much as possible.
The other important little thing that needed to be solved in time was that CPKREDIR itself still wasn't as user-friendly as I would want it to be. It required manual INI file editing, it had a separate executable for patching the game to use the DLL, and it made turning mods on/off pretty annoying. For solving this, I took some quick inspiration from the Skyrim BSP Manager and coded a quick tool to manage mods visually. That tool is SonicGMI, and it got released with the Dragon Road demo for the Hacking Contest. The tool itself doesn't do much other than modifying the INI files CPKREDIR uses; it's just a more user-friendly approach.
Organize entire project and time
The Dragon Road demo was met with very positive feedback, and overall the project was much better organized than before. With the workspace set, we could keep using the same method for every level as long as we set a deadline each month. Sure enough, from July to February (7 months), we managed to finish all 7 of the remaining levels we set out to do with the quality level we wanted. Each month a single level was the goal, and nobody was supposed to work on any other stage until it was done. Some levels took more work than others, but overall the time limits helped to organize the project a lot. Keep in mind this time frame was designed around the fact that we could only dedicate our free time into this.
Next article: "Level development"