The R User Conference 2016

June 27 - June 30 2016
Stanford University, Stanford, California



Regression Modeling Strategies and the R rms Package

Frank E. Harrell Jr. - Vanderbilt University

Post-tutorial notes

The materials used in the tutorial are available here.

Tutorial Description

The art of data analysis concerns using flexible statistical models, choosing tools wisely, avoiding overfitting, estimating quantities of interest, making statistical inferences and predictions, validating predictive accuracy, graphical presentation of complex models, and many other important techniques. Regression models can be extended in a number of ways to meet many of the modern challenges in data analysis. Software that makes it easier to incorporate modern statistical methods and good statistical practice removes obstacles and leads to greater insights from data. The presenter has striven to bring modern regression, missing data imputation, data reduction, and bootstrap model validation techniques into everyday practice by writing Regression Modeling Strategies (Springer, 2015, 2nd edition) and by writing an R package rms that accompanies the book. Detailed information may be found at http://biostat.mc.vanderbilt.edu/rms.

The tutorial will cover two chapters in Regression Modeling Strategies related to general aspects of multivariable regression, relaxing linearity assumptions using restricted cubic splines, multivariable modeling strategy, and a brief introduction to bootstrap model validation. The rms package will be introduced, and at least two detailed case studies using the package will be presented. The methods covered will apply to almost any regression model, including ordinary least squares, logistic regression models, ordinal regression, quantile regression, longitudinal data analysis, and survival models.

Tutorial Outline

  1. Splines for Estimating Shape of Regression Function and Determining Predictor Transformations
    1. Cubic Spline Functions
    2. Restricted Cubic Splines
    3. Choosing Number and Position of Knots
  2. Multivariable Modeling Strategy
    1. Why and How To Pre-specify Model Complexity
    2. Problems Caused by Ordinary Stepwise Variable Selection
    3. Some Useful Modeling Strategies for
      1. Prediction
      2. Estimation
      3. Hypothesis Testing
  3. Model Validation
  4. Graphical Methods for Interpreting Complex Regression Fits
  5. Overview of R rms package
  6. Detailed case study: Parametric survival modeling for time to event data

Background Knowledge

Attendees should have good proficiency in ordinary multiple regression modeling and basic proficiency with R.

Course Notes

Course notes are available at http://biostat.mc.vanderbilt.edu/tmp/course.pdf.


Back to Top ↑