Git and GitHub

Git and GitHub

0.1 Before starting:

  • Make sure you have git installed. See here.

  • Set up a GitHub account for use with Git if you haven't yet. That's more than just creating an account, so check that page even if you already have an account.

0.2 What is it?

Git

  • Version control system for code, data, and text of your projects.

  • One project - one folder - one git repository. A git repository is your project as viewed by git, see below for more details.

  • All files you want git to track (save versions of) should be in one folder referred to as repository root.

  • Each time you want to save a new version of your project, you choose which files to save and then explicitly tell git to save a new version. Choosing files (or parts of files) is referred to as adding or staging. Saving is referred to as committing. The version itself (the state of tracked files) is referred to as commit.

  • When you commit (save a new version of your project), git saves that new commit (version) only locally. To have an external copy of your commits, you'll need to upload them to an online server (we mostly use GitHub). Uploading is referred to as pushing. And the online version of your repository is referred to as a remote repository, or simply a remote.

GitHub

  • The server that we use to store and share our repositories.

  • Several people can work on the same project on their local machines at the same time.

  • Once they are ready to share, they push (upload) their local version to GitHub.

  • To update your repository folder with a new version from GitHub, you pull it.

  • If you want to download commits currently stored on GitHub but don't want to change your files yet, you can fetch the new commits.

Repository (repo)

A repository is a set of commits of your project.

  • Locally:

    • All the locally saved commits. They are stored in a hidden .git folder in the repository root. That folder itself is not versioned.

  • On GitHub:

    • All the commits pushed to GitHub.

Some of the commits might exist only locally or only on GitHub - that's ok.

Examples of things to version using Git

  • code (my_analysis.R, fix_everything.py)

  • data in text format (xx_xx_sparse_code.cha, anonymized_info.csv)

Things not to version

  • Non-text files (.wav, .mp3, etc.)

  • Files with private information

Command-line commands we'll need

cd directory/                   # go to a directory
cd ..                           # go to parent directory
ls                              # show what is in current directory
mkdir new_directory/            # create new directory
touch filename.ext              # create new file

1. Basics

Create a repo, add, commit, status, pull, push, diff, log, gitk

a. Create a repo locally, then push to GitHub

  • Locally:

    • Create the directory that you want to track and go into that directory

      mkdir git_test
      cd git_test
    • Create a file (how about starting with a README.md ?)

      touch README.md
    • Indicate to git that you want to track this directory

      git init
    • Add a file to track

      git add README.md
    • Create a first snapshot/first version of your repo

      git commit -m "first commit"
  • Remotely:

    • Go to GitHub https://github.com

    • Click on New

      • name: self explanatory

      • description: same here

      • public/private: will people you haven't explicitly invited be able to see your repo

      • README.md: short description

      • .gitignore: types of files you want git to ignore, i.e. never added (example: knitted files in R, .pyc in Python)

      • license: distribution restrictions

    • Back to command line, indicate where to look for remote and send all you have on that remote

      git remote add origin [url of your new git repo]
      git push -u origin master
    • Now check GitHub page again

    • Overview of what is available on GitHub

b. Retrieve changes

  • Go to online version of your README.md

  • Modify the README.md, click on Commit changes

  • Back to command line

    git pull
    open -a atom README.md    # if this does not work, open Atom and open file from there

    c. Push changes

  • Modify README.md locally, then review local changes.

    git diff
    git status
  • Add those changes

    git add README.md
    git diff
    git status
  • Commit those changes

    git commit -m "Changed README.md"
    git status
  • CHECK THAT NOBODY CHANGED SOMETHING IN THE MEANTIME and then push

    git pull # you are the only one working on this project so extremely unlikely
    git push
  • get a (visual?) history of the commits

    git log
    gitk

    Or use a GUI.

2. Intermediate

a. Amend

If you have:

  • a typo in your commit message

  • forgotten to add a file before commit

  • some last minute changes that could be included in the same commit

    git commit --amend -m "new message"

b. Merge

Whenever you use git pull, Git actually does two things: it translates that on command to 1. git fetch which retrieves the remote changes, and git merge master which merges the changes with the local version. Usually, Git is good at merging files, i.e. finding what the most recent version of everything is; however, when different users change the same line in the same file, poor Git does not know what to do, so it... complains. As anybody would do when several people ask conflicting things from them.

  • Remotely:

    • change one line in the README.md and click on Commit changes

  • Locally:

    • change that same line in the README.md to something different

    • add, commit, and pull

      git add README.md
      git commit -m "changed that line in the README.md"
      git pull
  • OH NO auto-merge failed

  • open your file in Atom (if that is not done already)

  • choose the version you want to keep

  • add, commit, pull and push!

    git add README.md
    git commit -m "merged because line x was different"
    git pull
    git push

c. Reset

You want to unstage (revert the 'add' action) a file before committing:

git reset HEAD [file to reset]

You realize that all you have done lately does not make sense and you want to go back to that nice clean version that you had some time ago. Lucky you, it is still available (thank you Git):

  • on GitHub, on your repo page, click on x commits right below the description of your project

  • retrieve the key of the commit you want to go back to

  • go back!

    git reset --hard [commit key]

d. Branches

You want to test something without modifying the pipeline that currently works, or (random example, out of my very own imagination) you would like to add annotids to every annotation in Seedlings but you need to try it out before actually doing it: branches are there for you.

Already existing branches

To look at the available branches that you have, you can use the command

git branch

Of course right now, you should have only one: master. It is possible that the remote branches are not the same as the one on your computer (because you have no use for them, for example); you can get a list of them using

git branch -a

which will show your local master branch as well as the remote origin/master branch (origin the variable containing the url of the repo you cloned from; you can check what the real value of that variable is by typing cat .git/config or git remote -v).

Creating branches

To create a new branch called new-branch, you can use

git branch new-branch

and use git branch to see what changed. A new branch called new-branch appeared! Ain't it magical? However, you can see that you are not working on that branch: the star that indicates where you are is still next to master. To switch to that new branch, you will have to use the following:

git checkout new-branch

and use git branch to see what happened: once again, it's magical, the star changed places and is now next to new-branch. Et voilà! You can switch back to master (and check that you indeed made the switch) by using git checkout master and git branch. Play with it a few times, just to get used to it (how to get bored of that little star going from one branch to the other, I wonder).

Tip: to create AND switch to a new branch at the same time, you can use git checkout -b new-branch which is just the two previous commands concatenated.

Pushing a new branch

Now let's move back to new-branch for good and modify something in there. Let's add some text at the end of the README.md and push that new branch:

git checkout new-branch
echo "adding this line at the end of the README" >> README.md
git add README.md
git commit -m "added a line at the end of the readme in new-b"
git push

FATAL -- I mean what were you expecting, you're telling Git to apply the difference you just made to something that does not exist remotely. Fortunately, it tells you exactly what to do to fix this issue:

git push --set-upstream origin new-branch

Branches and gitk

"I tried to see those branches using gitk but the only one that appears is the one I am currently on!"

Well, first of all, it is a good thing to check gitk from time to time. Nicely done. Then, there is something you have to know: gitk is lazy (aren't we all) and will only display the minimal version of what you asked. To see all the branches, you have to specify that you want to see all the branches:

gitk --branches=*

and there you go! Two branches just for you.

And yes, that star means "everything" in regular expressions language. I am not sure of how much you want to know about regular expressions, so for now, just accept that this is how it is, but if you are interested, it could be the topic of another blab core meeting.

Checking out a commit

So now, we know how to start from where we are and create a new part of our project, independently of the working one. But maybe a few commits ago, there was a version of that one file that you would like to work with (maybe you wrote the results section of your paper based on that commit, and you need more information on something). So you would like all the files to be in the state they were when you computed your results. Well you can also checkout a specific commit if you know its key:

git checkout [commit key]

Weird message You are in 'detached HEAD' state., what is that?

e. Detached HEAD

OFF WITH THEIR HEADS

HEAD refers to the last commit you made in the branch you are in. It is a reference to where you are currently working. You can check where that is by typing

cat .git/HEAD

This will give you either the branch you are in (if you are in a branch) or the commit you are working on (if you checked out a commit). In a detached HEAD state, your changes will not be saved by Git; they will eventually be handled by the garbage manager. To visualize what this means, go to http://git-school.github.io/visualizing-git/#upstream-changes and type the following commands:

git commit
git commit
git checkout b80e     # checking out a commit => detached HEAD written at the top of the screen
git commit
git commit
git checkout master

OH NO those commits that we just did on that detached head are now grey and in dotted lines. This means that eventually, they will be removed from the git history.

"But... but.. I want to keep them, I NEED them!"

No worries! You can make them into a branch of its own in order to keep those changes. First, let's go back to that visualizing tool and undo that last checkout (note that undo only exists in that tool):

undo

We are back in the state we were in, with our detached HEAD. Now let's make this detached HEAD into a branch and... well re-attach it:

git checkout -b attached-head

The detached HEAD does not appear anymore, instead the value of HEAD is indicated. Now let's go back to master:

git checkout master

Tadaaah! Now those changes are tracked by the git system and do not disappear when you move to another branch. Your changes are safe!

f. Stash

Ok, we are now back to command line. You can do a quick git status to make sure that everything is in order, and then git branch to remind yourself of which branches you have. Let's go to master if you are not already there, and let's modify the README.md.

git checkout master
echo "unwanted modification" >> README.md

For some reason that only you know about, there is something you have to do in new-branch, so let's go there:

git checkout new-branch

Ugh, error. You start to believe that Git just does not want you to be using it, but you're wrong: Git is there to HELP you and prevent you from doing something stupid, like changing branches while you have unsaved modifications, and it tells you so.

error: Your local changes to the following files would be overwritten by checkout:
    README.md
Please commit your changes or stash them before you switch branches.
Aborting

The Aborting is reassuring, it means that there was an error, but everything is back to the state you were in before doing anything. Now on GitHub, on the master branch, write a new line at the beginning of the README.md (so that is does not conflict later with your latest changes) and then in the command line:

git pull

Ugh, error again. But... it's very similar to the previous one:

error: Your local changes to the following files would be overwritten by merge:
    README.md
Please commit your changes or stash them before you merge.
Aborting

You have two options: first, you can add and commit your changes. Easy. But if the changes you have made are bad ones, or unfinished ones, or changes that you don't need right now while you do need the pull/checkout another branch, then you can stash your changes, that is putting them away for now but potentially saving them for later.

git status    # modifications that you don't want
git stash
git status    # clean!

You can now safely pull or checkout

git pull
git status

"But I want to keep working on those unwanted modifications now that I have the latest version of the project!"

Well you can retrieve those using

git stash pop

If all goes well, your latest changes are back. The worst thing that can happen is a merging issue: when you use git stash pop, Git merges your stashed changes with the current version of each file, so any conflict will result in a merge to fix... which you now know how to do.

3. Advanced

a. Fork

b. Rebase

c. Cherry-pick

d. Pull requests

4. Other tools

We have seen gitk already, here are some other things you might want to check out:

# Automation

  1. Travis / CircleCI / Jenkins: performing checks each time someone pushes something on the repo and sending the output of those checks wherever you want (email, slack channel,...)

  2. GitHub actions:

    1. Arguably better because you don't need to acquaint yourself with another third-party service.

    2. Less feature-full as Travis or CircleCI.

Git GUIs

Using terminal/console/shell/cli is uber cool and all but also unintuitive and confusing, at least to me (to Zhenya). GUIs let you do most of the usual operations you do via a mouse click. They also provide more readable diffs because reading in a console is not fun. Because of that, you are more likely to figure out that something went wrong, to avoid committing what you don't want to commit, etc. Also, most of them have pretty visualization of the repo history including all the branches. It makes it very easy, for example, to find where a given change was introduced. git blame and git bisect are probably more efficient, but again - less intuitive. And if a need arises, you can always use the console - GUIs and cli are not exclusive. Most of the GUIs will even have a button that will open a terminal already in the repo root.

If you dislike GUIs and prefer git cli - great! But in that case, please

  1. Do not use git add ., git commit -a, or git add -u (the last one is slightly better). This often introduces changes that are unwanted, unrelated to the change you are going to commit, or both. While deleting .DS_Store is simple enough, sometimes you will end up with changes and new files that came from you don't know where and can't be sure you can delete.

  2. Instead, use git status and git diff to review changes before committing.

  3. Split changes into multiple commits: add individual files, chunks, or even lines in each of them.

  4. Write a message that will tell others and yourself what you did (in the first line) and also why (in the body).

In my (Zhenya's) opinion, these rules are much easier to follow when you are using GUIs, but to each their own.

Here are some popular GUIs:

  1. GitHub Desktop:

    1. No need to think about GitHub authentication - it just works which is great.

    2. Too eager to stage (add) everything you changed - easy for unwanted changes to slip through.

  2. GitKraken

    1. Not free.

    2. Probably great but I don't know because it is not free and I am cheap (Zhenya).

  3. SourceTree (Zhenya-recommended)

    1. Has split view staging which is awesome. It allows you to separately see staged and unstaged changes. When you click on files in the staged tab, you will see a diff between index and HEAD - what would be committed right now, in the cli you would achieve that git diff --cached. And when you click on a file in the unstaged tab, you will see a difference between the working tree and HEAD, excluding the already staged changes - git diff in the cli.

    2. Has a very intuitive visualization for the repo history with local and remote branches, tags, diffs between arbitrary commits, cherry-picking, resetting to an arbitrary commit, or checking it out, etc.

    3. Quite buggy so at least once a week it will need to be completely restarted. Previously, it used to happen every day, so there is a lot of improvement here.

    4. You will have to create an account on BitBucket to use SourceTree. Not explained, annoying, manipulative. Also, it takes around 2 minutes.

Other links that you may want to look at but not too much:

Last updated