GIT – under the hood

Kamil Kurzyna

Most of us use git as a version control tool and it isn’t a surprise. It’s very useful but it has one drawback – we can waste hours while solving problems. Also there is a very good reason why git related questions are at the very top on Stackoverflow – many people understand the idea of Git and version control but not many know how it works. That’s why I’m going to unmask Git and show how it really works.

Types of files

We have to start with three main types of files in git:

  • blob – a binary representation of your file
  • tree – a file with pointers to blobs and other trees – it’s basically just a way for git to keep the structure of the files
  • commit – a file with an obvious author and message, a pointer to a tree

Now you have probably noticed the word pointer in definitions up there. Well, the truth is that git is a key-value db, where the key is SHA and the value is some sort of file (tree, blob, etc.). They even say it in git documentation.

So what’s happening down there?

Let us create a simple file, for example file.txt with some basic message. If we call

foo@bar:~$ git add .
foo@bar:~$ ls .git/objects

we will notice that there was created a new dir in .git/objects.

.git/objects/51/f466f2e446ade0b0b2e5778ce3e0fa95e380e8

This is a blob file of our file.txt. Blob is a binary file, thus simple cat will give us some gibberish. Fortunately, there is a *git cat file command to read that blob file. It only needs SHA, which is the key in key-value db. Our key is the first six digits, which is 51f466 ( 51 is a header of our SHA ). And so we can use

foo@bar:~$ git cat-file -p 51f466
A file

which is a content of my file.txt. Neat. Now let’s do a commit.

foo@bar:~$ git commit -m "my commit"

Now we can notice that git has created two more files

foo@bar:~$ ls .git/objects
51/   94/   a5/   info/ pack/

The types and contents of those files are

foo@bar:~$ git cat-file -t 94b532
tree
foo@bar:~$ git cat-file -p 94b532
100644 blob 51f466f2e446ade0b0b2e5778ce3e0fa95e380e8    file.txt

and

foo@bar:~$ git cat-file -t a511dd
commit
foo@bar:~$ git cat-file -p a511dd
tree 94b532f5ca23b87e4163d3e10bd39bb153d1a246
author foo <foo@bar.gg> 1579817567 +0100
committer foo <foo@bar.gg> 1579817567 +0100

my commit

As we can see, that tree file tells us about relation between file.txt and blob of SHA 51f466 commit, on the other hand, has a pointer to that tree. Now to make uno punto muy importante, we will change the content of file.txt, create an additional file and commit our changes.

Now, if we list our files we will have

24/  3b/  51/  5a/  6c/  94/  a5/  info/  pack/

4 more files?! As we suspect, one is for bloc for new file, tree and commit. So what about that last one? Well, if we inspect those files closely

foo@bar:~$ git cat-file -t 24e7df
blob
foo@bar:~$ git cat-file -p 24e7df
A new file

foo@bar:~$ git cat-file -t 5a3194
blob
foo@bar:~$ git cat-file -p 5a3194
A file after changes

foo@bar:~$ git cat-file -t 6c22a9
tree
foo@bar:~$ git cat-file -p 6c22a9
100644 blob 5a3194d2db0000c18464869764e169276313f4bd    file.txt
100644 blob 24e7dfa22219c967b5e1724d86404489cf93f7f0    file2.txt

foo@bar:~$ git cat-file -t 3b9a54
commit
foo@bar:~$ git cat-file -p 3b9a54
tree 6c22a914a98ffcc637bd49ced52b8646f15f364b
parent a511dd78ecefdde0e4aa15930c12cdfe35e6e4df
author foo <foo@bar.gg>  1579818365 +0100
committer foo <foo@bar.gg> 1579818365 +0100

second commit

Our file.txt has a new SHA (513194), which is mentioned in tree which is mentioned in commit. If we look closely, we will notice that we have two blobs which contain old and new version of file.txt. Tree file basically tells us „hey look, that file.txt has a blob content” and commit says „I want to use this tree”. And now we can see, how it works! We change the commit, we just change the tree we are using. We delete a file and want to reverse it – don’t worry, the tree has a SHA of blob content of that file, so we can easily restore it! Also as a little bonus, in content of our latest commit file, we can see a parent – it’s just a SHA of an earlier commit file, since it was our starting position for this commit.

Summary

In this short introduction to the „GIT – under the hood” we now see that git isn’t that scary. It’s a simple key-value db with a few features. Of course, there are plenty more things to talk about (like branches for example), but all those advanced functionalities are from the basics written up here. Now you have the basics to explore them yourself or just wait for another article about „GIT – under the hood”.

Peace!

Poznaj mageek of j‑labs i daj się zadziwić, jak może wyglądać praca z j‑People!

Skontaktuj się z nami