The R User Conference 2016

June 27 - June 30 2016
Stanford University, Stanford, California



Using Git and GitHub with R, RStudio, and R Markdown

Jenny Bryan - University of British Columbia & rOpenSci

Post-tutorial notes

The materials used in the tutorial are available here.

Tutorial Description

Data analysts can use the Git version control system to manage a motley assortment of project files in a sane way (e.g., data, code, reports, etc.). This has benefits for the solo analyst and, especially, for anyone who wants to communicate and collaborate with others. Git helps you organize your project over time and across different people and computers. Hosting services like GitHub, Bitbucket, and GitLab provide a home for your Git-based projects on the internet.

What's special about using R and Git(Hub)?

  • the active R package development community on GitHub
  • workflows for R scripts and R Markdown files that make it easy to share source and rendered results on GitHub
  • Git- and GitHub-related features of the RStudio IDE

Tutorial Outline

The tutorial will be structured as ~5 task-oriented units. Indicative topics:

  • The most difficult part: installation and configuration!
  • Creating a Git repository and connecting the local repo to a GitHub remote, for new and existing projects.
  • The intersection of GitHub and the R world: R packages developed on Github and how to make use of "issues"; METACRAN read-only mirror of all of CRAN; R-specific searching tips.
  • How to propose a change or fix to someone else's project, i.e. "make a pull request".
  • Daily workflows and FAQ: how often should I commit?, which files should I commit? how do I change a commit or its message? how do groups of 1, 5, or 10 people structure their work with Git(Hub)? etc.

This will be a hands-on tutorial, so bring your prepared laptop and pre-register a free GitHub account (see below).

What This Tutorial Is Not

This tutorial will teach novices about Git on a strict "need to know" basis. Git was built to manage development of the Linux kernel, which is probably very different from what you do. Most people need a small subset of Git's functionality and that will be our focus. If you want a full-blown exposition of Git as a directed acyclic graph or a treatise on the Git-Flow branching strategy, you will be sad.

Our target audience is someone who uses R to analyze data. While R package development with Git(Hub) is absolutely in scope, it's not an explicit focus or requirement.

We target GitHub - not Bitbucket or GitLab - for the sake of specificity. However, all the big-picture principles and even some mechanics will carry over to these alternative hosting platforms.

Background Knowledge

The tutorial is aimed at intermediate to advanced R users, who are comfortable writing R scripts and managing R projects. You should have a good grasp of files and directories and be generally knowledgeable about where things live on your computer.

Although we will show alternatives for most Git operations, we will inevitably spend some time in the shell and we assume some prior experience. For example, you should know how to open up a shell, navigate to a certain directory, and list the files there. You should be comfortable using shell commands to view/move/rename files and to work with your command history.

R Markdown or RStudio will feature prominently in most of the units, so this tutorial will be most rewarding for people who already use these or are eager to try them out.

Preparation

Preparation instructions can be found here.

It is vital that you attempt to set up your system in advance. You cannot show up at 9am with no preparation and keep up! These are battle-tested instructions, so most will succeed, but it could easily take 1 - 2 hours. We believe in you! We will have TAs in the room starting at 8:15am and throughout the workshop.

Instructor Biography

Jenny Bryan (twitter, GitHub) is a professor at the University of British Columbia. She's been using and teaching R (or S!) for 20 years, most recently in STAT 545 and Software Carpentry. Other aspects of her R life include work with rOpenSci, development of the googlesheets and gapminder packages, and being academic director for UBC's Masters of Data Science.

Dean Attali and Bernhard Konrad will be teaching assistants. They both have experience in teaching this material (and much more) in STAT 545 and Software Carpentry. Added bonus: they know how to use Windows.


Back to Top ↑