Rmd-to-pdf GitHub workflow

Why?

This is a good way to ensure that your notebook doesn't rely on you being connected to the shared drive, on files that only exist on your computer, etc.

Prerequisite

You should be using renv in your repository. See renv - reproducible R environments

How

We will use GitHub Actions - service provided by GitHub that can run a series of commands on GitHub servers for you. That sequence of commands is referred to as a workflow and is defined by a file you put into .github/workflows/ folder in the root of your repo.

These workflows can be run when a certain event happens (e.g., a push is made to the repository) using triggers you define in the same file. We, however, won't be doing that and will run our workflow manually - either in the browser or through command line.

The workflow we are going to use knits the R Markdown file you specify to a pdf document and save it on the GitHub servers. In the context of GitHub actions, this is referred to as uploading an artifact.

  1. (done once) Add a workflow file to your repository.

  2. Run the workflow.

  3. (if it fails) Read the log and debug.

  4. Download the output pdf file.

Workflow File

Here is a template of a workflow file you can use:

on:
  workflow_dispatch
  
# This workflow only runs when you manually trigger it. This can be done from
# the Actions tab on GitHub which takes unnecessarily many steps. A quicker way
# is to use GitHub CLI (https://cli.github.com/):
#
# To run the workflow on the current branch (you can still use it if you don't
# use branching):
# gh workflow run -w rmd-to-pdf --ref `git branch --show-curent`
#
# To see the results of the last 5 runs, use
# gh run list -w rmd-to-pdf -L 5
#
# Less useful, but still nice, is to delete all the run from GitHub and start
# over:
# gh run list -w rmd-to-pdf | awk '{print $1}' | xargs gh run delete


name: rmd-to-pdf

jobs:
  rmd-to-pdf:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
        with:
          fetch-depth: 1
          
      # - name: Clone a dataset to ~/BLAB_DATA
      #   uses: bergelsonlab/public-files/clone-blab-data@main
      #   with:
      #     repository: bergelsonlab/<repository-name>
      #     fetch-depth: 1
      
      - uses: bergelsonlab/public-files/knit-rmd-to-pdf@main
        with:
          rmd_path: <path-to/your/R-Markdown-notebook.Rmd>
  • Copy it and save as .github/workflows/rmd-to-pdf.yaml in your repository.

  • Change <path-to/your/R-Markdown-notebook.Rmd> to match your repository.

  • If you use one of our repo-based datasets, such as seedlings-nouns or vihi_annotations, uncomment the Clone a dataset to ~/BLAB_DATA step and change <repository-name> to the name of the repo you need. You can copy this step if you use multiple datasets.

Run the workflow

There are two ways:

In the browser

  • Open the repository GitHub page.

  • Click "Actions".

  • Choose rmd-to-pdf.

  • Find where it says "This workflow has a workflow_dispatch event trigger."

  • Click on "Run Workflow".

On the command line

You will need to have GitHub command line tools installed. If you authenticated on GitHub using ourSet up Git and GitHubinstructions, you should already have them installed. If not, go to Set up Git and GitHuband find instructions for installing them.

The command to run:

gh workflow run rmd-to-pdf --ref `git branch --show-current`

Download the pdf

If everything goes well and the workflow runs successfully, the pdf files will be saved on GitHub servers. Here is how to access it:

  • Open the repository GitHub page.

  • Click "Actions".

  • Choose rmd-to-pdf in the left pane.

  • Click on the top rmd-to-pdf link in the table.

  • Scroll down to "Artifacts".

  • Click on the name of your R Markdown notebook to download an archive with the pdf.

If the workflow fails

You will get an email telling you that a workflow run failed. There will be a link - click on it to see the log of the run. Find the part where the failure occurred and try to use information there to fix the problem. If something is unclear, send the link to the log to the lab technician.

Last updated