# Git and GitHub

## Git and GitHub

### 0.1 Before starting:

* Make sure you have git installed. See [here](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git).
* [Set up a GitHub account](/programming-info/computing-programming-guides/git-and-github/set-up-github.md) for use with Git if you haven't yet. That's more than just creating an account, so check that page even if you already have an account.
* Log in at <https://github.com>.

### 0.2 What is it?

#### Git

* Version control system for code, data, and text of your projects.
* One project - one folder - one ***git repository**.* A git repository is your project as viewed by git, see [below](#undefined) for more details.
* All files you want git to ***track*** (save versions of) should be in one folder referred to as ***repository root***.
* Each time you want to save a new version of your project, you choose which files to save and then explicitly tell git to save a new version. Choosing files (or parts of files) is referred to as ***adding*** or ***staging***. Saving is referred to as ***committing**.* The version itself (the state of tracked files) is referred to as **commit**.
* When you commit (save a new version of your project), git saves that new **commit** (version) only locally. To have an external copy of your commits, you'll need to upload them to an online server (we mostly use [GitHub](#github)). Uploading is referred to as **pushing**. And the online version of your repository is referred to as a **remote repository**, or simply a **remote**.

#### GitHub

* The server that we use to store and share our repositories.
* Several people can work on the same project on their local machines at the same time.
* Once they are ready to share, they ***push*** (upload) their local version to GitHub.
* To update your repository folder with a new version from GitHub, you ***pull*** it.
* If you want to download commits currently stored on GitHub but don't want to change your files yet, you can **fetch** the new commits.

#### Repository (repo)

A repository is a set of commits of your project.

* Locally:
  * All the locally saved commits. They are stored in a hidden `.git` folder in the repository root. That folder itself is not versioned.
* On GitHub:
  * All the commits pushed to GitHub.

Some of the commits might exist only locally or only on GitHub - that's ok.

#### Examples of things to version using Git

* code (`my_analysis.R`, `fix_everything.py`)
* data in text format (`xx_xx_sparse_code.cha`, `anonymized_info.csv`)

**Things not to version**

* Non-text files (.wav, .mp3, etc.)
* Files with private information

#### **C**ommand-line commands we'll need

```shell
cd directory/                   # go to a directory
cd ..                           # go to parent directory
ls                              # show what is in current directory
mkdir new_directory/            # create new directory
touch filename.ext              # create new file
```

### 1. Basics

*Create a repo, add, commit, status, pull, push, diff, log, gitk*

#### a. Create a repo locally, then push to GitHub

* Locally:
  * Create the directory that you want to track and go into that directory

    ```
    mkdir git_test
    cd git_test
    ```
  * Create a file (how about starting with a README.md ?)

    ```
    touch README.md
    ```
  * Indicate to git that you want to track this directory

    ```
    git init
    ```
  * Add a file to track

    ```
    git add README.md
    ```
  * Create a first snapshot/first version of your repo

    ```
    git commit -m "first commit"
    ```
* Remotely:
  * Go to GitHub <https://github.com>
  * Click on `New`
    * name: self explanatory
    * description: same here
    * public/private: will people you haven't explicitly invited be able to see your repo
    * README.md: short description
    * .gitignore: types of files you want git to ignore, i.e. never added (example: knitted files in R, .pyc in Python)
    * license: distribution restrictions    &#x20;
  * Back to command line, indicate where to look for remote and send all you have on that remote

    ```
    git remote add origin [url of your new git repo]
    git push -u origin master
    ```
  * Now check GitHub page again
  * Overview of what is available on GitHub

#### b. Retrieve changes

* Go to online version of your README.md
* Modify the README.md, click on `Commit changes`
* Back to command line

  ```
  git pull
  open -a atom README.md    # if this does not work, open Atom and open file from there
  ```

  \
  **c. Push changes**
* Modify README.md locally, then review local changes.

  ```
  git diff
  git status
  ```
* Add those changes

  ```
  git add README.md
  git diff
  git status
  ```
* Commit those changes

  ```
  git commit -m "Changed README.md"
  git status
  ```
* CHECK THAT NOBODY CHANGED SOMETHING IN THE MEANTIME and then push

  ```
  git pull # you are the only one working on this project so extremely unlikely
  git push
  ```
* get a (visual?) history of the commits

  ```
  git log
  gitk
  ```

  Or use a [GUI](#git-guis).

### 2. Intermediate

#### a. Amend

If you have:

* a typo in your commit message
* forgotten to add a file before commit
* some last minute changes that could be included in the same commit

  ```
  git commit --amend -m "new message"
  ```

#### b. Merge

Whenever you use `git pull`, Git actually does two things: it translates that on command to 1. `git fetch` which retrieves the remote changes, and `git merge master` which merges the changes with the local version. Usually, Git is good at merging files, i.e. finding what the most recent version of everything is; however, when different users change the same line in the same file, poor Git does not know what to do, so it... complains. As anybody would do when several people ask conflicting things from them.

* Remotely:
  * change one line in the README.md and click on `Commit changes`
* Locally:
  * change that same line in the README.md to something different
  * add, commit, and pull

    ```
    git add README.md
    git commit -m "changed that line in the README.md"
    git pull
    ```
* OH NO auto-merge failed
* open your file in Atom (if that is not done already)
* choose the version you want to keep
* add, commit, pull and push!

  ```
  git add README.md
  git commit -m "merged because line x was different"
  git pull
  git push
  ```

#### c. Reset

You want to unstage (revert the 'add' action) a file before committing:

```
git reset HEAD [file to reset]
```

You realize that all you have done lately does not make sense and you want to go back to that nice clean version that you had some time ago. Lucky you, it is still available (thank you Git):

* on GitHub, on your repo page, click on `x commits` right below the description of your project
* retrieve the key of the commit you want to go back to
* go back!

  ```
  git reset --hard [commit key]
  ```

#### d. Branches

You want to test something without modifying the pipeline that currently works, or (random example, out of my very own imagination) you would like to add annotids to every annotation in Seedlings but you need to try it out before actually doing it: branches are there for you.

**Already existing branches**

To look at the available branches that you have, you can use the command

```
git branch
```

Of course right now, you should have only one: `master`. It is possible that the remote branches are not the same as the one on your computer (because you have no use for them, for example); you can get a list of them using

```
git branch -a
```

which will show your local `master` branch as well as the remote `origin/master` branch (`origin` the variable containing the url of the repo you cloned from; you can check what the real value of that variable is by typing `cat .git/config` or `git remote -v`).

**Creating branches**

To create a new branch called `new-branch`, you can use

```
git branch new-branch
```

and use `git branch` to see what changed. A new branch called `new-branch` appeared! Ain't it magical? However, you can see that you are not working on that branch: the star that indicates where you are is still next to `master`. To switch to that new branch, you will have to use the following:

```
git checkout new-branch
```

and use `git branch` to see what happened: once again, it's magical, the star changed places and is now next to new-branch. Et voilà! You can switch back to master (and check that you indeed made the switch) by using `git checkout master` and `git branch`. Play with it a few times, just to get used to it (how to get bored of that little star going from one branch to the other, I wonder).

*Tip: to create AND switch to a new branch at the same time, you can use* `git checkout -b new-branch` *which is just the two previous commands concatenated.*

**Pushing a new branch**

Now let's move back to `new-branch` for good and modify something in there. Let's add some text at the end of the README.md and push that new branch:

```
git checkout new-branch
echo "adding this line at the end of the README" >> README.md
git add README.md
git commit -m "added a line at the end of the readme in new-b"
git push
```

FATAL -- I mean what were you expecting, you're telling Git to apply the difference you just made to something that does not exist remotely. Fortunately, it tells you exactly what to do to fix this issue:

```
git push --set-upstream origin new-branch
```

**Branches and gitk**

"I tried to see those branches using `gitk` but the only one that appears is the one I am currently on!"

Well, first of all, it is a good thing to check `gitk` from time to time. Nicely done. Then, there is something you have to know: `gitk` is lazy (aren't we all) and will only display the minimal version of what you asked. To see all the branches, you have to specify that you want to see all the branches:

```
gitk --branches=*
```

and there you go! Two branches just for you.

*And yes, that star means "everything" in regular expressions language. I am not sure of how much you want to know about regular expressions, so for now, just accept that this is how it is, but if you are interested, it could be the topic of another blab core meeting.*

**Checking out a commit**

So now, we know how to start from where we are and create a new part of our project, independently of the working one. But maybe a few commits ago, there was a version of that one file that you would like to work with (maybe you wrote the results section of your paper based on that commit, and you need more information on something). So you would like all the files to be in the state they were when you computed your results. Well you can also checkout a specific commit if you know its key:

```
git checkout [commit key]
```

Weird message `You are in 'detached HEAD' state.`, what is that?

#### e. Detached HEAD

OFF WITH THEIR HEADS

`HEAD` refers to the last commit you made in the branch you are in. It is a reference to where you are currently working. You can check where that is by typing

```
cat .git/HEAD
```

This will give you either the branch you are in (if you are in a branch) or the commit you are working on (if you checked out a commit). In a `detached HEAD` state, your changes will not be saved by Git; they will eventually be handled by the garbage manager. To visualize what this means, go to <http://git-school.github.io/visualizing-git/#upstream-changes> and type the following commands:

```
git commit
git commit
git checkout b80e     # checking out a commit => detached HEAD written at the top of the screen
git commit
git commit
git checkout master
```

OH NO those commits that we just did on that detached head are now grey and in dotted lines. This means that eventually, they will be removed from the git history.

"But... but.. I want to keep them, I NEED them!"

No worries! You can make them into a branch of its own in order to keep those changes. First, let's go back to that visualizing tool and undo that last checkout (*note that* `undo` *only exists in that tool*):

```
undo
```

We are back in the state we were in, with our detached HEAD. Now let's make this detached HEAD into a branch and... well re-attach it:

```
git checkout -b attached-head
```

The detached HEAD does not appear anymore, instead the value of HEAD is indicated. Now let's go back to `master`:

```
git checkout master
```

Tadaaah! Now those changes are tracked by the git system and do not disappear when you move to another branch. Your changes are safe!

#### f. Stash

Ok, we are now back to command line. You can do a quick `git status` to make sure that everything is in order, and then `git branch` to remind yourself of which branches you have. Let's go to `master` if you are not already there, and let's modify the README.md.

```
git checkout master
echo "unwanted modification" >> README.md
```

For some reason that only you know about, there is something you have to do in `new-branch`, so let's go there:

```
git checkout new-branch
```

Ugh, error. You start to believe that Git just does not want you to be using it, but you're wrong: Git is there to HELP you and prevent you from doing something stupid, like changing branches while you have unsaved modifications, and it tells you so.

```
error: Your local changes to the following files would be overwritten by checkout:
    README.md
Please commit your changes or stash them before you switch branches.
Aborting
```

The `Aborting` is reassuring, it means that there was an error, but everything is back to the state you were in before doing anything. Now on GitHub, on the master branch, write a new line at the beginning of the README.md (so that is does not conflict later with your latest changes) and then in the command line:

```
git pull
```

Ugh, error again. But... it's very similar to the previous one:

```
error: Your local changes to the following files would be overwritten by merge:
    README.md
Please commit your changes or stash them before you merge.
Aborting
```

You have two options: first, you can add and commit your changes. Easy. But if the changes you have made are bad ones, or unfinished ones, or changes that you don't need right now while you do need the pull/checkout another branch, then you can `stash` your changes, that is putting them away for now but potentially saving them for later.

```
git status    # modifications that you don't want
git stash
git status    # clean!
```

You can now safely pull or checkout

```
git pull
git status
```

"But I want to keep working on those unwanted modifications now that I have the latest version of the project!"

Well you can retrieve those using

```
git stash pop
```

If all goes well, your latest changes are back. The worst thing that can happen is a merging issue: when you use `git stash pop`, Git merges your stashed changes with the current version of each file, so any conflict will result in a merge to fix... which you now know how to do.

### 3. Advanced

#### a. Fork

#### b. Rebase

#### c. Cherry-pick

#### d. Pull requests

### 4. Other tools

We have seen `gitk` already, here are some other things you might want to check out:

#### # Automation

1. Travis / CircleCI / Jenkins: performing checks each time someone pushes something on the repo and sending the output of those checks wherever you want (email, slack channel,...)
2. GitHub actions:
   1. Arguably better because you don't need to acquaint yourself with another third-party service.
   2. Less feature-full as Travis or CircleCI.

#### Git GUIs

Using terminal/console/shell/cli is uber cool and all but also unintuitive and confusing, at least to me (to Zhenya). GUIs let you do most of the usual operations you do via a mouse click. They also provide more readable diffs because reading in a console is not fun. Because of that, you are more likely to figure out that something went wrong, to avoid committing what you don't want to commit, etc. Also, most of them have pretty visualization of the repo history including all the branches. It makes it very easy, for example, to find where a given change was introduced. `git blame` and `git bisect` are probably more efficient, but again - less intuitive. And if a need arises, you can always use the console - GUIs and cli are not exclusive. Most of the GUIs will even have a button that will open a terminal already in the repo root.&#x20;

If you dislike GUIs and prefer git cli - great! But in that case, please

1. Do not use `git add .`, `git commit -a`, or `git add -u` (the last one is slightly better). This often introduces changes that are unwanted, unrelated to the change you are going to commit, or both. While deleting `.DS_Store` is simple enough, sometimes you will end up with changes and new files that came from you don't know where and can't be sure you can delete.
2. Instead, use `git status` and `git diff` to review changes before committing.
3. Split changes into multiple commits: add individual files, chunks, or even lines in each of them.&#x20;
4. Write a message that will tell others and yourself what you did (in the first line) and also why (in the body).&#x20;

In my (Zhenya's) opinion, these rules are much easier to follow when you are using GUIs, but to each their own.

Here are some popular GUIs:

1. GitHub Desktop:&#x20;
   1. No need to think about GitHub authentication - it just works which is great.
   2. Too eager to stage (add) everything you changed - easy for unwanted changes to slip through.
2. GitKraken
   1. Not free.
   2. Probably great but I don't know because it is not free and I am cheap (Zhenya).
3. SourceTree (Zhenya-recommended)
   1. Has split view staging which is awesome. It allows you to separately see staged and unstaged changes. When you click on files in the staged tab, you will see a diff between index and HEAD - what would be committed right now, in the cli you would achieve that `git diff --cached`. And when you click on a file in the unstaged tab, you will see a difference between the working tree and HEAD, excluding the already staged changes - `git diff` in the cli.
   2. Has a very intuitive visualization for the repo history with local and remote branches, tags, diffs between arbitrary commits, cherry-picking, resetting to an arbitrary commit, or checking it out, etc.
   3. Quite buggy so at least once a week it will need to be completely restarted. Previously, it used to happen every day, so there is a lot of improvement here.
   4. You will have to create an account on BitBucket to use SourceTree. Not explained, annoying, manipulative. Also, it takes around 2 minutes.

### 5. Useful links

* Learn while visualizing what's happening: <https://learngitbranching.js.org>
* Visualize the commands you are using: <http://git-school.github.io/visualizing-git/#free>
* Git cheat-sheet: <https://github.github.com/training-kit/downloads/github-git-cheat-sheet/>

Other links that you may want to look at but not too much:

* Travis CI: <https://docs.travis-ci.com/user/for-beginners/> (CI = continuous integration)
* GitKraken: <https://www.gitkraken.com/git-client> (mix of `gitk` and GitHub Desktop)
* GitHub Desktop: <https://desktop.github.com>
* Source Tree: <https://www.sourcetreeapp.com/>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://gitbook.bergelsonlab.com/programming-info/computing-programming-guides/git-and-github.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
