Rmd-to-pdf GitHub workflow
Why?
This is a good way to ensure that your notebook doesn't rely on you being connected to the shared drive, on files that only exist on your computer, etc.
Prerequisite
You should be using renv
in your repository. See renv - reproducible R environments
How
We will use GitHub Actions - service provided by GitHub that can run a series of commands on GitHub servers for you. That sequence of commands is referred to as a workflow and is defined by a file you put into .github/workflows/
folder in the root of your repo.
These workflows can be run when a certain event happens (e.g., a push is made to the repository) using triggers you define in the same file. We, however, won't be doing that and will run our workflow manually - either in the browser or through command line.
The workflow we are going to use knits the R Markdown file you specify to a pdf document and save it on the GitHub servers. In the context of GitHub actions, this is referred to as uploading an artifact.
(done once) Add a workflow file to your repository.
Run the workflow.
(if it fails) Read the log and debug.
Download the output pdf file.
Workflow File
Here is a template of a workflow file you can use:
on:
workflow_dispatch
# This workflow only runs when you manually trigger it. This can be done from
# the Actions tab on GitHub which takes unnecessarily many steps. A quicker way
# is to use GitHub CLI (https://cli.github.com/):
#
# To run the workflow on the current branch (you can still use it if you don't
# use branching):
# gh workflow run -w rmd-to-pdf --ref `git branch --show-curent`
#
# To see the results of the last 5 runs, use
# gh run list -w rmd-to-pdf -L 5
#
# Less useful, but still nice, is to delete all the run from GitHub and start
# over:
# gh run list -w rmd-to-pdf | awk '{print $1}' | xargs gh run delete
name: rmd-to-pdf
jobs:
rmd-to-pdf:
runs-on: ubuntu-latest
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- name: Checkout repo
uses: actions/checkout@v3
with:
fetch-depth: 1
# - name: Clone a dataset to ~/BLAB_DATA
# uses: bergelsonlab/public-files/clone-blab-data@main
# with:
# repository: bergelsonlab/<repository-name>
# fetch-depth: 1
- uses: bergelsonlab/public-files/knit-rmd-to-pdf@main
with:
rmd_path: <path-to/your/R-Markdown-notebook.Rmd>
Copy it and save as
.github/workflows/rmd-to-pdf.yaml
in your repository.Change
<path-to/your/R-Markdown-notebook.Rmd>
to match your repository.If you use one of our repo-based datasets, such as
seedlings-nouns
orvihi_annotations
, uncomment theClone a dataset to ~/BLAB_DATA
step and change<repository-name>
to the name of the repo you need. You can copy this step if you use multiple datasets.
Run the workflow
There are two ways:
In the browser
Open the repository GitHub page.
Click "Actions".
Choose
rmd-to-pdf
.Find where it says "This workflow has a workflow_dispatch event trigger."
Click on "Run Workflow".
On the command line
You will need to have GitHub command line tools installed. If you authenticated on GitHub using ourSet up Git and GitHubinstructions, you should already have them installed. If not, go to Set up Git and GitHuband find instructions for installing them.
The command to run:
gh workflow run rmd-to-pdf --ref `git branch --show-current`
Download the pdf
If everything goes well and the workflow runs successfully, the pdf files will be saved on GitHub servers. Here is how to access it:
Open the repository GitHub page.
Click "Actions".
Choose
rmd-to-pdf
in the left pane.Click on the top
rmd-to-pdf
link in the table.Scroll down to "Artifacts".
Click on the name of your R Markdown notebook to download an archive with the pdf.
If the workflow fails
You will get an email telling you that a workflow run failed. There will be a link - click on it to see the log of the run. Find the part where the failure occurred and try to use information there to fix the problem. If something is unclear, send the link to the log to the lab technician.
Last updated