MothPriest: A Python Library for File Parsers

Motivation

If I had a nickel for every time I've had to write a binary file parser, I would have 5 nickels...

Anyway, writing binary file parsers can be a massive pain because you have to deal with byte representations of data and often complex table structures. You also have to deal with transformations like compression, different versions of the file format leading to branching possibilities, and checksums which must be set exactly right or things break.

Skyrim VR

There is an unbelievable quantity of mods for Bethesda's hit game, The Elder Scrolls V: Skyrim, many of which are of such magnitude and quality that the modified game is unrecognizable from the original. Newer versions of Skyrim allow the player to add up to 1024 mods at once, but the virtual reality release of Skyrim is strangely stuck at a cap of 256 despite its recency.

To get around these caps, people will often merge multiple mods into a single mod, freeing up N-1 slots with N being the number of mods in the merge. This is a perfectly fine solution, but I found out the hard way that it has one notable drawback: you cannot merge mods mid-playthrough because all of the references in your savefile will point to the old mods instead of the new merge. In essence, a merge is just a bunch of smaller mods in a trench coat, which makes it unrecognizable to the savefile.

Patching Savefiles

After reading various forum posts about how it was, "impossible to merge mods mid-playthrough without losing data," I decided to take matters into my own hands. Maybe internet strangers couldn't accomplish this feat, but I knew that the act of merging mods does not actually result in a loss of information. Given that you know which mods were merged together, it is possible to update the content of the savefile so that all the references are mapped to the correct objects in the merged mod.

The only issue with this is that the ".ess" savefiles used in Skyrim are quite complex in subtle ways, and I was more than tired of manually writing file parsers at this point in time. This gave birth to MothPriest. Named after the (fictional) group of priests who are able to decipher the Elder Scrolls in the games, this library makes it easy to write complicated binary file parsers in python.

MothPriest: An Overview

MothPriest uses a tree structure to represent binary file formats, where each internal node is some kind of collection, such as a table, and each leaf node is a specific record of some kind. Unlike many other file parsers, MothPriest's Parser objects directly store the records that they have parsed, and are capable of unparsing back into the original binary format with ease.

References

The most common paradigm in binary files is the combination of sizes and offsets. Under this model, files will contain a byte-encoded unsigned integer representing the size of a data table in bytes, and then often another unsigned integer representing the number of bytes from the start of the file (or some other reference point) until you get to the data table.

This is great from an efficiency standpoint because it means that if you only want to read one table, you can just read its size and offset, and then read that many bytes starting at the offset. A more naïve way of doing this would be to have tables take up as much space as needed and then have some kind of delimiter to demarcate where the next table begins. In that case you would have to keep reading through the file until you found the table you wanted, and then you would have to keep reading and checking until you found the start of the next table.

MothPriest makes it incredibly easy to represent these referential structures. For a simplified example, imagine a file with only two items: a string_length and a string_value. Here's what the MothPriest parser might look like:

parser = BlockParser(
    "root",
    [
        IntegerParser("string_size", size=4, little_endian=True, signed=False),
        StringParser("string_value", size="string_size") # Note the reference back to string_size
    ]
)

Because every object has a unique ID (e.g. "string_value"), the StringParser object can just refer to "string_size" as its size parameter, and that value will be automatically populated at parse-time.

Even better, when you unparse the file back into its original format, MothPriest can also update the string_size value. So if the original file contained a 5-byte-long string, and you updated it to be 10 bytes long, MothPriest would write the number 10 instead of 5 when you unparse.

Types of Parsers

The most basic parsers in MothPriest are leaf nodes, like IntegerParser and StringParser. In order to capture structure, we need other parsers like the BlockParser, ReferenceCountParser, TransformationParser, and ReferenceMappedParser.

The BlockParser is relatively simple. It just contains a list of other parsers which it executes sequentially on the input as in the example above, only requiring that each child parser has a well-defined size.

The ReferenceCountParser takes a parser-generating function and a reference as input, and can parse an arbitrary number of elements. Here is an example of a parser that can parse an arbitrary number of length-determined strings:

sized_string = lambda idx: BlockParser(idx, [
    IntegerParser("string_size", size=4, little_endian=True, signed=False),
    StringParser("string_value", size="string_size")
])

parser = BlockParser(
    "root",
    [
        IntegerParser("string_count", size=4, little_endian=True, signed=False),
        ReferenceCountParser("strings", "string_count", sized_string),
    ]
)

This way we completely avoid having to write any loops in our parser, and again, MothPriest will update the string_count value when unparsing if objects have been added or removed.

The ReferenceMappedParser takes a reference and uses the retrieved value in a lookup dictionary for the correct parser to use. One use case for this occurs when file formats have different versions which require different parsers. In this case the reference would be to the parser responsible for the file version, and the dictionary would map versions to parser objects.

Finally, the TransformationParser is a special BlockParser which applies arbitrary python functions to the data before applying the child parsers. Naturally, it supports reverse transformations as well, meaning that it can undo whatever it did when unparsing. The primary use case for this is with various compression and decompression algorithms, but it also has another use I'm very proud of.

That use being as the parent of the BytesExpansionParser. This Parser can take multiple bytes and split the underlying bits into multiple groups of arbitrary size before parsing. So if you need to parse 2 bytes where the first 2 bits are one value, the next 5 bits are another, and the remaining 9 bits are another, you don't need to write a custom parser to split the values apart. You can just use the BytesExpansionParser and place 3 normal parsers inside of it! All of the extracted values are padded to the nearest byte so an example of the above scenario might look like this: 1010000111111111 -> 00000010,00000000,0000000111111111.

There are more parsers in MothPriest than I can reasonably cover here, some of them more niche than others. I have used the vast majority of them in my other new tool, SkyrimSaveToolkit, if you want to see examples.

SkyrimSaveToolkit

SkyrimSaveToolkit, as its name suggests, is a toolkit for working with Skyrim savefiles. Currently it only performs two operations: general parsing/unparsing and merge patching. When you merge two mods using a common tool like ZMerge, part of the output is a file which maps object references from what they were in the original mods to what they will be in the merged mod.

So given this map file and the name of the merged mod file, SkyrimSaveToolkit can rewrite a savefile to reflect this merge. The result is a file indistinguishable from what you would have if the mods had always been merged. There are some limitations to this, but having tested it in both a toy scenario as well as a significantly more complicated one, I am confident that it works.

So the answer to the question of whether or not you can merge Skyrim mods mid-playthrough is now a resounding yes. Yes you can.