Git and GitHub
Git and GitHub
0.1 Before starting:
Make sure you have git installed. See here.
Set up a GitHub account for use with Git if you haven't yet. That's more than just creating an account, so check that page even if you already have an account.
Log in at https://github.com.
0.2 What is it?
Git
Version control system for code, data, and text of your projects.
One project - one folder - one git repository. A git repository is your project as viewed by git, see below for more details.
All files you want git to track (save versions of) should be in one folder referred to as repository root.
Each time you want to save a new version of your project, you choose which files to save and then explicitly tell git to save a new version. Choosing files (or parts of files) is referred to as adding or staging. Saving is referred to as committing. The version itself (the state of tracked files) is referred to as commit.
When you commit (save a new version of your project), git saves that new commit (version) only locally. To have an external copy of your commits, you'll need to upload them to an online server (we mostly use GitHub). Uploading is referred to as pushing. And the online version of your repository is referred to as a remote repository, or simply a remote.
GitHub
The server that we use to store and share our repositories.
Several people can work on the same project on their local machines at the same time.
Once they are ready to share, they push (upload) their local version to GitHub.
To update your repository folder with a new version from GitHub, you pull it.
If you want to download commits currently stored on GitHub but don't want to change your files yet, you can fetch the new commits.
Repository (repo)
A repository is a set of commits of your project.
Locally:
All the locally saved commits. They are stored in a hidden
.git
folder in the repository root. That folder itself is not versioned.
On GitHub:
All the commits pushed to GitHub.
Some of the commits might exist only locally or only on GitHub - that's ok.
Examples of things to version using Git
code (
my_analysis.R
,fix_everything.py
)data in text format (
xx_xx_sparse_code.cha
,anonymized_info.csv
)
Things not to version
Non-text files (.wav, .mp3, etc.)
Files with private information
Command-line commands we'll need
1. Basics
Create a repo, add, commit, status, pull, push, diff, log, gitk
a. Create a repo locally, then push to GitHub
Locally:
Create the directory that you want to track and go into that directory
Create a file (how about starting with a README.md ?)
Indicate to git that you want to track this directory
Add a file to track
Create a first snapshot/first version of your repo
Remotely:
Go to GitHub https://github.com
Click on
New
name: self explanatory
description: same here
public/private: will people you haven't explicitly invited be able to see your repo
README.md: short description
.gitignore: types of files you want git to ignore, i.e. never added (example: knitted files in R, .pyc in Python)
license: distribution restrictions
Back to command line, indicate where to look for remote and send all you have on that remote
Now check GitHub page again
Overview of what is available on GitHub
b. Retrieve changes
Go to online version of your README.md
Modify the README.md, click on
Commit changes
Back to command line
c. Push changes
Modify README.md locally, then review local changes.
Add those changes
Commit those changes
CHECK THAT NOBODY CHANGED SOMETHING IN THE MEANTIME and then push
2. Intermediate
a. Amend
If you have:
a typo in your commit message
forgotten to add a file before commit
some last minute changes that could be included in the same commit
b. Merge
Whenever you use git pull
, Git actually does two things: it translates that on command to 1. git fetch
which retrieves the remote changes, and git merge master
which merges the changes with the local version. Usually, Git is good at merging files, i.e. finding what the most recent version of everything is; however, when different users change the same line in the same file, poor Git does not know what to do, so it... complains. As anybody would do when several people ask conflicting things from them.
Remotely:
change one line in the README.md and click on
Commit changes
Locally:
change that same line in the README.md to something different
add, commit, and pull
OH NO auto-merge failed
open your file in Atom (if that is not done already)
choose the version you want to keep
add, commit, pull and push!
c. Reset
You want to unstage (revert the 'add' action) a file before committing:
You realize that all you have done lately does not make sense and you want to go back to that nice clean version that you had some time ago. Lucky you, it is still available (thank you Git):
on GitHub, on your repo page, click on
x commits
right below the description of your projectretrieve the key of the commit you want to go back to
go back!
d. Branches
You want to test something without modifying the pipeline that currently works, or (random example, out of my very own imagination) you would like to add annotids to every annotation in Seedlings but you need to try it out before actually doing it: branches are there for you.
Already existing branches
To look at the available branches that you have, you can use the command
Of course right now, you should have only one: master
. It is possible that the remote branches are not the same as the one on your computer (because you have no use for them, for example); you can get a list of them using
which will show your local master
branch as well as the remote origin/master
branch (origin
the variable containing the url of the repo you cloned from; you can check what the real value of that variable is by typing cat .git/config
or git remote -v
).
Creating branches
To create a new branch called new-branch
, you can use
and use git branch
to see what changed. A new branch called new-branch
appeared! Ain't it magical? However, you can see that you are not working on that branch: the star that indicates where you are is still next to master
. To switch to that new branch, you will have to use the following:
and use git branch
to see what happened: once again, it's magical, the star changed places and is now next to new-branch. Et voilà! You can switch back to master (and check that you indeed made the switch) by using git checkout master
and git branch
. Play with it a few times, just to get used to it (how to get bored of that little star going from one branch to the other, I wonder).
Tip: to create AND switch to a new branch at the same time, you can use git checkout -b new-branch
which is just the two previous commands concatenated.
Pushing a new branch
Now let's move back to new-branch
for good and modify something in there. Let's add some text at the end of the README.md and push that new branch:
FATAL -- I mean what were you expecting, you're telling Git to apply the difference you just made to something that does not exist remotely. Fortunately, it tells you exactly what to do to fix this issue:
Branches and gitk
"I tried to see those branches using gitk
but the only one that appears is the one I am currently on!"
Well, first of all, it is a good thing to check gitk
from time to time. Nicely done. Then, there is something you have to know: gitk
is lazy (aren't we all) and will only display the minimal version of what you asked. To see all the branches, you have to specify that you want to see all the branches:
and there you go! Two branches just for you.
And yes, that star means "everything" in regular expressions language. I am not sure of how much you want to know about regular expressions, so for now, just accept that this is how it is, but if you are interested, it could be the topic of another blab core meeting.
Checking out a commit
So now, we know how to start from where we are and create a new part of our project, independently of the working one. But maybe a few commits ago, there was a version of that one file that you would like to work with (maybe you wrote the results section of your paper based on that commit, and you need more information on something). So you would like all the files to be in the state they were when you computed your results. Well you can also checkout a specific commit if you know its key:
Weird message You are in 'detached HEAD' state.
, what is that?
e. Detached HEAD
OFF WITH THEIR HEADS
HEAD
refers to the last commit you made in the branch you are in. It is a reference to where you are currently working. You can check where that is by typing
This will give you either the branch you are in (if you are in a branch) or the commit you are working on (if you checked out a commit). In a detached HEAD
state, your changes will not be saved by Git; they will eventually be handled by the garbage manager. To visualize what this means, go to http://git-school.github.io/visualizing-git/#upstream-changes and type the following commands:
OH NO those commits that we just did on that detached head are now grey and in dotted lines. This means that eventually, they will be removed from the git history.
"But... but.. I want to keep them, I NEED them!"
No worries! You can make them into a branch of its own in order to keep those changes. First, let's go back to that visualizing tool and undo that last checkout (note that undo
only exists in that tool):
We are back in the state we were in, with our detached HEAD. Now let's make this detached HEAD into a branch and... well re-attach it:
The detached HEAD does not appear anymore, instead the value of HEAD is indicated. Now let's go back to master
:
Tadaaah! Now those changes are tracked by the git system and do not disappear when you move to another branch. Your changes are safe!
f. Stash
Ok, we are now back to command line. You can do a quick git status
to make sure that everything is in order, and then git branch
to remind yourself of which branches you have. Let's go to master
if you are not already there, and let's modify the README.md.
For some reason that only you know about, there is something you have to do in new-branch
, so let's go there:
Ugh, error. You start to believe that Git just does not want you to be using it, but you're wrong: Git is there to HELP you and prevent you from doing something stupid, like changing branches while you have unsaved modifications, and it tells you so.
The Aborting
is reassuring, it means that there was an error, but everything is back to the state you were in before doing anything. Now on GitHub, on the master branch, write a new line at the beginning of the README.md (so that is does not conflict later with your latest changes) and then in the command line:
Ugh, error again. But... it's very similar to the previous one:
You have two options: first, you can add and commit your changes. Easy. But if the changes you have made are bad ones, or unfinished ones, or changes that you don't need right now while you do need the pull/checkout another branch, then you can stash
your changes, that is putting them away for now but potentially saving them for later.
You can now safely pull or checkout
"But I want to keep working on those unwanted modifications now that I have the latest version of the project!"
Well you can retrieve those using
If all goes well, your latest changes are back. The worst thing that can happen is a merging issue: when you use git stash pop
, Git merges your stashed changes with the current version of each file, so any conflict will result in a merge to fix... which you now know how to do.
3. Advanced
a. Fork
b. Rebase
c. Cherry-pick
d. Pull requests
4. Other tools
We have seen gitk
already, here are some other things you might want to check out:
# Automation
Travis / CircleCI / Jenkins: performing checks each time someone pushes something on the repo and sending the output of those checks wherever you want (email, slack channel,...)
GitHub actions:
Arguably better because you don't need to acquaint yourself with another third-party service.
Less feature-full as Travis or CircleCI.
Git GUIs
Using terminal/console/shell/cli is uber cool and all but also unintuitive and confusing, at least to me (to Zhenya). GUIs let you do most of the usual operations you do via a mouse click. They also provide more readable diffs because reading in a console is not fun. Because of that, you are more likely to figure out that something went wrong, to avoid committing what you don't want to commit, etc. Also, most of them have pretty visualization of the repo history including all the branches. It makes it very easy, for example, to find where a given change was introduced. git blame
and git bisect
are probably more efficient, but again - less intuitive. And if a need arises, you can always use the console - GUIs and cli are not exclusive. Most of the GUIs will even have a button that will open a terminal already in the repo root.
If you dislike GUIs and prefer git cli - great! But in that case, please
Do not use
git add .
,git commit -a
, orgit add -u
(the last one is slightly better). This often introduces changes that are unwanted, unrelated to the change you are going to commit, or both. While deleting.DS_Store
is simple enough, sometimes you will end up with changes and new files that came from you don't know where and can't be sure you can delete.Instead, use
git status
andgit diff
to review changes before committing.Split changes into multiple commits: add individual files, chunks, or even lines in each of them.
Write a message that will tell others and yourself what you did (in the first line) and also why (in the body).
In my (Zhenya's) opinion, these rules are much easier to follow when you are using GUIs, but to each their own.
Here are some popular GUIs:
GitHub Desktop:
No need to think about GitHub authentication - it just works which is great.
Too eager to stage (add) everything you changed - easy for unwanted changes to slip through.
GitKraken
Not free.
Probably great but I don't know because it is not free and I am cheap (Zhenya).
SourceTree (Zhenya-recommended)
Has split view staging which is awesome. It allows you to separately see staged and unstaged changes. When you click on files in the staged tab, you will see a diff between index and HEAD - what would be committed right now, in the cli you would achieve that
git diff --cached
. And when you click on a file in the unstaged tab, you will see a difference between the working tree and HEAD, excluding the already staged changes -git diff
in the cli.Has a very intuitive visualization for the repo history with local and remote branches, tags, diffs between arbitrary commits, cherry-picking, resetting to an arbitrary commit, or checking it out, etc.
Quite buggy so at least once a week it will need to be completely restarted. Previously, it used to happen every day, so there is a lot of improvement here.
You will have to create an account on BitBucket to use SourceTree. Not explained, annoying, manipulative. Also, it takes around 2 minutes.
5. Useful links
Learn while visualizing what's happening: https://learngitbranching.js.org
Visualize the commands you are using: http://git-school.github.io/visualizing-git/#free
Other links that you may want to look at but not too much:
Travis CI: https://docs.travis-ci.com/user/for-beginners/ (CI = continuous integration)
GitKraken: https://www.gitkraken.com/git-client (mix of
gitk
and GitHub Desktop)GitHub Desktop: https://desktop.github.com
Source Tree: https://www.sourcetreeapp.com/
Last updated