Sunday 10 April 2016

My Own Simple Version Control System

Why?



Well, it was Friday night. I was trying to revert my video game to a previous version. I opened up the Git bash, stashed my work, git add -A, and git reset --hard to the previous commit. But, lo and behold, git refused to change the repository in any way whatsoever despite professing that it was successful in doing so.

"Odd.", I thought, "Perhaps I should check this out on StackOverflow and see how others do this.". I found a question similar to mine; unfortunately, every answer presented at least 3 different methods of doing the very same. Moreover, each answer had comment sections filled with people fuming about the shortcomings of the method that the answer used. 

Now, I understand that I should've consulted the manual; StackOverflow isn't exactly known for providing robust solutions to highly specified problems. That being said, the fact that there isn't one agreed upon way of reverting to a previous version (which is practically the foundation of VCS) is rather frustrating. Don't get me wrong, git works very well on a larger scale; its complexity is simply a byproduct of the type of problem it is trying to solve. But for a lone user like me who requires nothing but a simple means of keeping track of versions and being able to switch between them without hassle, it is clearly too complex.

This is why I decided to create my own simple VCS. To be quite honest, I know there are other, simpler VCS than git which I could use instead, but I was intrigued by the concept of writing one.

How?

At first, I presumed a simple program which copied the current directory and then stored a zipped version of it in another one would suffice. This would be quite trivial to implement and would serve my needs (sorta). There were, of course, some clear problems with this, most prominently: it wasted space. 

You see, I store most of my project files on Dropbox. Foolhardy, maybe, but I like having access to it regardless of where I am or what device I'm using. This is why a system like the one I proposed above simply couldn't work for me. The fact that it stored every file regardless of whether it changed or not would result it my Dropbox folder (with a meager 2 GB of space) would be filled with just a couple dozen versions of my programs. 

As such, I set out to build a system which would track changed files, and instead of storing each individual version as a compressed archive, the entire project repository and every version would be compressed as a whole into one file. This would, for the most part, result in a smaller repository than if each version were compressed separately.

Obviously, this is easier said than done. By far, the hardest problem to solve is that of tracking changes. My solution to this was as simple as I could possibly make it:

Each item in the project directory was stored in an ENTRY structure. Each entry has its own path and last write time stored, and it was tagged as either a FILE entry, DIRECTORY entry, or a REFERENCE entry. The first simply stored the size and data of a file, the second stored a linked list of child entries, and the third stored a version name.

Let's say you stored your first version named "first" (by the way, I made version names mandatory for now, and you can't have multiple versions with the same name). The system would create entries for each item it would find, and store it all in a compressed file called .ctrl (I used miniz.c for compression). You then proceeded to work on the directory, change some files and then you stored another version named "added_player_controls". The only file changed between this version and "first" was "player.c" What the system would do is that for all files which did not change, it would simply create a reference with the same path and last write time, and have it refer to the version from which it originated, in this case "first":
Forgive my crudely drawn diagram.

Anyways, this worked well, especially with compression, meaning I could store A LOT of versions with many little changes without using too much space.

But then came the problem of removing referenced versions.

I dealt with this problem by doing the simplest thing I could yet again. When a version is removed, all later versions are scanned for references to files stored in that version. The next version which references such a file now owns the file (i.e the entry is changed to a FILE entry) and all versions after that have their references now pointing to the owning version:

Again, my diagrams are horrendous, but I hope you get the idea. Now the version "added_player_controls" owns all the files it referenced previously, and "updated_loop" now references "added_player_controls" as opposed to "first".

User Interface

For now, the system is simply a command line application (only for Windows). You have access to the following commands:


Now, I know this doesn't provide any of the fancy branching, merging and forking capabilities of git; nor does it do any delta encoding (whereby only changed bytes are stored as opposed to entire files), but it's simple, fast, and it works.

And that's enough for me.

PS. If there was ever a feature I desperately needed, I could add it in quite simply too since the whole program spans about 1200 lines of C code.