GIT – under the hood
Most of us use git as a version control tool and it isn’t a surprise. It’s very useful but it has one drawback – we can waste hours while solving problems. Also there is a very good reason why git related questions are at the very top on Stackoverflow – many people understand the idea of Git and version control but not many know how it works. That’s why I’m going to unmask Git and show how it really works.
Types of files
We have to start with three main types of files in git:
- blob – a binary representation of your file
- tree – a file with pointers to blobs and other trees – it’s basically just a way for git to keep the structure of the files
- commit – a file with an obvious author and message, a pointer to a tree
Now you have probably noticed the word pointer in definitions up there. Well, the truth is that git is a key-value db, where the key is SHA and the value is some sort of file (tree, blob, etc.). They even say it in git documentation.
So what’s happening down there?
Let us create a simple file, for example file.txt with some basic message. If we call
foo@bar:~$ git add . foo@bar:~$ ls .git/objects
we will notice that there was created a new dir in .git/objects.
This is a blob file of our file.txt. Blob is a binary file, thus simple cat will give us some gibberish. Fortunately, there is a *git cat file command to read that blob file. It only needs SHA, which is the key in key-value db. Our key is the first six digits, which is 51f466 ( 51 is a header of our SHA ). And so we can use
foo@bar:~$ git cat-file -p 51f466 A file
which is a content of my file.txt. Neat. Now let’s do a commit.
foo@bar:~$ git commit -m "my commit"
Now we can notice that git has created two more files
foo@bar:~$ ls .git/objects 51/ 94/ a5/ info/ pack/
The types and contents of those files are
foo@bar:~$ git cat-file -t 94b532 tree foo@bar:~$ git cat-file -p 94b532 100644 blob 51f466f2e446ade0b0b2e5778ce3e0fa95e380e8 file.txt
foo@bar:~$ git cat-file -t a511dd commit foo@bar:~$ git cat-file -p a511dd tree 94b532f5ca23b87e4163d3e10bd39bb153d1a246 author foo <firstname.lastname@example.org> 1579817567 +0100 committer foo <email@example.com> 1579817567 +0100 my commit
As we can see, that tree file tells us about relation between file.txt and blob of SHA 51f466 commit, on the other hand, has a pointer to that tree. Now to make uno punto muy importante, we will change the content of file.txt, create an additional file and commit our changes.
Now, if we list our files we will have
24/ 3b/ 51/ 5a/ 6c/ 94/ a5/ info/ pack/
4 more files?! As we suspect, one is for bloc for new file, tree and commit. So what about that last one? Well, if we inspect those files closely
foo@bar:~$ git cat-file -t 24e7df blob foo@bar:~$ git cat-file -p 24e7df A new file foo@bar:~$ git cat-file -t 5a3194 blob foo@bar:~$ git cat-file -p 5a3194 A file after changes foo@bar:~$ git cat-file -t 6c22a9 tree foo@bar:~$ git cat-file -p 6c22a9 100644 blob 5a3194d2db0000c18464869764e169276313f4bd file.txt 100644 blob 24e7dfa22219c967b5e1724d86404489cf93f7f0 file2.txt foo@bar:~$ git cat-file -t 3b9a54 commit foo@bar:~$ git cat-file -p 3b9a54 tree 6c22a914a98ffcc637bd49ced52b8646f15f364b parent a511dd78ecefdde0e4aa15930c12cdfe35e6e4df author foo <firstname.lastname@example.org> 1579818365 +0100 committer foo <email@example.com> 1579818365 +0100 second commit
Our file.txt has a new SHA (513194), which is mentioned in tree which is mentioned in commit. If we look closely, we will notice that we have two blobs which contain old and new version of file.txt. Tree file basically tells us „hey look, that file.txt has a blob content” and commit says „I want to use this tree”. And now we can see, how it works! We change the commit, we just change the tree we are using. We delete a file and want to reverse it – don’t worry, the tree has a SHA of blob content of that file, so we can easily restore it! Also as a little bonus, in content of our latest commit file, we can see a parent – it’s just a SHA of an earlier commit file, since it was our starting position for this commit.
In this short introduction to the „GIT – under the hood” we now see that git isn’t that scary. It’s a simple key-value db with a few features. Of course, there are plenty more things to talk about (like branches for example), but all those advanced functionalities are from the basics written up here. Now you have the basics to explore them yourself or just wait for another article about „GIT – under the hood”.