RMarkdown ManyBabies Seminar
Last updated
Last updated
There is a set of 18 criteria that have to be satisfied for data to be considered anonymized by DHHS - stripped both from PII and PHI. It is called “DHHS safe harbor criteria” and is available here.
According to Mike Frank, if those criteria are satisfied, the data is no longer considered to be human subject data so you are free to make it as public as you want and IRB has no say in it.
This is about the US only.
You can run RStudio online for free on htttps://posit.cloud. It has exactly the same interface as the desktop version. You can connect it to a github repo too. Not sure how viable it is to work there every day (possibly very much viable) but it is definitely great to have that as a backup option in case you can’t use you usual computer for some reason. Also, this provides a good way to test that other people can knit your RMarkdown. (automatic testing for knittability after pushing to GitHub is coming to BLab later this year).
For many people, the most used panes in RStudio are the editor and the R console. By default, RStudio puts them one under another and you have to resize them often to give more space to one or the other.
Mike Frank recommended the following pane layout in RStudio (whether cloud or local):
Zhenya's favorite layout:
The main point of both is to have the editor and the console next to each other, both potentially occupying the whole vertical space.
You might rely on other panes more often - play around and see what works best for you. Or don't - the default is also just fine.
If your collaborators don't use RMarkdown, you'll need to give them your manuscript in the format that they are used to. Most commonly, it will be Word or Google Docs. So, you export to Word, possibly upload to Google Docs, and let the collaborators edit the document while you continue working on your RMarkdown file. There are a few problems you'll face with this approach:
It is time-consuming and very annoying to figure out what changes were made and then incorporate them in your RMarkdown. Putting the Google docs into the suggestion-only mode can help by letting you do one change at a time. It still isn't great.
People edit R-generated elements: tables, plots, and printed inline expressions. And even when they don't, you copy a paragraph with a new sentence into RMardkown and accidentally lose all your `r print_test_results(accuracy_comparison)`
and then these numbers don't update.
And even if all your collaborators do use RMarkdown, it is just so much nicer to edit documents in Word/Google Docs, especially with all the collaboration features they have like suggestions, comments, etc.
Solutions:
redoc
mentioned in the seminar. It knits RMarkdown documents to a special Word file that has code chunks and, more importantly, inline code highlighted so that no one touches either. You can then upload it to Google Docs (don’t convert to the Google Docs format though!), edit, comment, download, and merge the changes (and comments!) back into the RMarkdown file. The main problem is that it hasn’t been updated in years :-(
trackdown
mentioned by Erin. It has a similar basic idea: highlighting stuff that shouldn’t be changed, working on that version of the document, and then merging the changes back. However, it keeps the file in the .Rmd format and automatically uploads to and downloads files from Google docs. So, you are editing RMarkdown, not a different type of document keeping the conversion simpler. Also, the package is in active development. Cons: formatting (bold, italic, etc.) applied on Google Docs won’t make it back to the RMarkdown. Not a huge problem, and also one that might be solved in the future.