A Grip On Git
A Simple, Visual Git Tutorial
Reading time ~11 minutes · By Vincent Tunru @VincentTunru @[email protected]
Have you memorised a few Git commands, without actually understanding what's going on? Then you've come to the right place! This how-to will help you level up in Git, going from being able to use Git to properly understanding it, getting a better grip on what is arguably one of the most important tools in software development at the moment.
This interactive tutorial visualises what is happening when you're using Git. As you scroll down the page, you will be guided through the most important basic concepts of Git, while applying them to a visualised example repository.
Imagine we're starting a new project that we want to manage using Git. This could be any type of project, but for the sake of this tutorial, let's say we're writing a book. The first order of action is to make Git aware of it. Inside the folder that is to house our book we initialise a new Git repository:
git init
Git has created a new hidden folder called .git
— everything that Git knows about our book will live in there, separate from our actual project.
Now let's say that we've already created two files: thanks.txt
, in which we will keep track of who we need to include in the word of thanks, and outline.txt
, which contains a general outline of what our book will be about.
Let's give Git an actual version to control — it is, after all, a version control system. You have to make a conscious choice about what each version (or in Git speak: commit) includes; for now, we only git add thanks.txt
. This is called staging the file thanks.txt
: marking it for inclusion included in the next commit.
This is what our Git history looks like after our first
git commit
The circle represents this first commit — please disregard the labels next to it for now.
If, later on in our project, we revert back to this commit, all we would be left with is thanks.txt
in its current state.
So why did we not include outline.txt
in our first commit? Basically, a good rule-of-thumb is that a commit should only contain changes that can be undone together. For example, if we later want to overhaul the structure of our book, we might want to ditch the outline we just wrote without also throwing away our list of people to thank.
Therefore, let us tell Git about outline.txt
in a separate commit.
We stage it using git add outline.txt
, then create another commit:
git commit
As you can see, our new commit is connected to the previous one (its parent). You can think of a commit as containing a list of the changes needed to turn the files as they were at the previous commit into the way they are now.
Now consider the label reading HEAD
next to our newly created commit. HEAD
can be seen as a pointer: when a new commit is made, it will be a child of the commit HEAD
is pointing to.
As an example, let's say we spelled someone's name wrong. After correcting it, we git add thanks.txt
again, and create a new commit:
git commit
The new commit is based on the commit previously labeled HEAD
, and the label is then updated to refer to the new commit as HEAD
.
Now let's turn to the label dubbed main
. This is called a branch. Like HEAD
, branches point to a commit. Unlike HEAD
, you can name them yourself. Furthermore, you can have multiple branches.
We can easily create a new branch. Let's call it myBranch
:
git branch myBranch
As you can see, there now are two labels pointing to the HEAD
commit: main
and myBranch
.
The current branch is still main
though. You can see what this means when we make some more changes and create another commit:
git commit
Whereas the main
branch moved along with HEAD
, myBranch
is still pointing to the previous commit. We will see why this is useful in a bit.
We can switch to myBranch
, meaning roughly that we will make HEAD
point to the same commit as myBranch
does:
git switch myBranch
As said, a new commit will be the child of the commit HEAD
is currently pointing to.
git commit
You can now see why they were called branches: with myBranch
pointing to a different commit than main
, our simple timeline has suddenly evolved into a tree structure.
Since we are still on myBranch
, new commits will move it along with HEAD
:
git commit
Branches are useful because they allow us to work on multiple things in parallel, without those things interfering with each other. For example, you could start work on a new chapter in a branch that only contains changes to that chapter. Now, when your editor is bugging you to finally submit a manuscript already darnit!
(hypothetically), you can simply switch back to the main
branch, perhaps fix up a few typos, and then send it to your editor. All this without having to include a half-finished chapter that wasn't that interesting anyway.
Anyway, let's say we did finish that chapter. That's nice and all, but now it would be nice if we wouldn't have to redo the small changes we performed earlier in the main
branch.
Of course Git can help us here: we can merge the state of the files at main
's commit with the state of the files in the current branch's latest commit:
git merge main
As you can see, Git has created a new merge commit for us in myBranch
. As opposed to regular commits, merge commits have multiple parents. This means that it contains multiple sets of changes: for each parent, the changes needed to make to it to make it incorporate the changes in the other parent(s).
Now is a good time to check whether both sets of changes work well together. In the case of our book, we could check whether our edits to our earlier chapters didn't mess up the layout of our new one.
If everything looks as expected, myBranch
now contains the latest complete version of our book. But now imagine that we'd have had another branch with another half-finished chapter. If we were to merge myBranch
into this other branch when that chapter is finished, and into another one for another chapter, and so on, then after a while we would lose track of which branch contains the latest version. It is therefore common practice to assign one branch as containing the latest complete version of a project. This branch is commonly called main
, so let's stick to that here.
However, if we switch to that branch now:
git switch main
…the chapter we wrote in myBranch
is gone. The reason for this is, of course, that while we have merged main
to myBranch
, we haven't merged myBranch
to main
yet. This might feel cumbersome, but by first copying everything into myBranch
, we had the opportunity to check whether everything still looked fine and dandy after the merge. Since we now know that it did, we can merge it back to main
without risking it resulting in a book with a messed-up layout.
git merge myBranch
Once again, our merge created another commit. main
is now ahead of myBranch
, since it includes commits that are not in myBranch
(i.e. the merge commit).
The features we've touched upon up to now are already incredibly powerful in allowing you to work on multiple versions of the same project. They enable you to confidently make sweeping changes to your project without fear of losing anything. After all, if you're unsatisfied with your changes, you can always start anew from a commit that does not include those changes.
However, Git has more tricks up its sleeve. A lot more, in fact, but for now we will only focus on how it enables effective collaboration.
As we have seen, you have your own repository, consisting of the project files and a tree representing their history. The same holds true of other team members: each has a separate repository, including the project files in a certain state. None of these are special, however, a team usually decides by convention to have a single central repository that contains the code that will be distributed to the user later on. Much like the main
branch should have the latest version of those parts of the project you've completed, the central repository should include the latest completed work by every team member. Let's look at how we can get our completed work into the central repository.
Let's assume that our editor has initialised an empty Git repository at https://publisher.com/book.git
. Our first step is that we need to make our local repository aware of this other repository:
git remote add publisher https://publisher.com/book.git
From our local repository's point of view, the other repositories are called remotes. Each remote has a name; we called the one we just added publisher
.
In this example, there's no code in the remote yet. Let's change that by pushing our main branch to publisher
.
git push publisher main
Several things have now happened. First, all the commits needed to go from an empty repository to the state of our local repository at main
have been sent to the publisher
remote. Secondly, we have a local reference to the currently known state of the remote's main
branch as publisher/main
. Finally, our local main
branch is matched to publisher/main
, roughly meaning that Git knows they are related.
We can simply continue working on our book and create a new commit as usual:
git commit
As long as do not push, our local changes will not be shared with others. While we were making our local changes, however, it is very well possible that our copy editor has proofread our first draft and submitted some corrections to the publisher
repository.
To check whether this is the case, let's ask what has happened since the last time we contacted it (i.e. during our push):
git fetch publisher
And indeed, a new commit was added to publisher
's main
branch. As you can see, while we weren't paying attention, our version of the branch sneakily diverged from the one at our publisher! This means we would not have been able to push the new version of our branch to our remote. It can be solved by merging the remote's work back into our own, making the history of our remote branch part of our local branch's history.
Like with regular branches, we can merge the remote branch into our current local branch:
git merge publisher/main
Once again, we can see that a merge commit has been created.
(Note that the above process of a fetch followed by a merge is fairly common. So common, in fact, that there is a single command git pull
that immediately executes both.)
Since our local branch is now ahead of the remote's, we can push it back to the remote.
git push publisher main
Our copy editor can now fetch the latest version that includes both the corrections he made, and the changes we made.
That wraps up this Git tutorial! I hope it helped you grasp the fundamental concepts, so that you won't feel completely lost when you have to perform more advanced operations in Git. If you're interested, do check out the source code of this tutorial (naturally, in a Git repository), and let me know what you think.