Introducing olsrr

Tools for linear regression.

I am pleased to announce the olsrr package, a set of tools for improved output from linear regression models, designed keeping in mind beginner/intermediate R users. The package includes:

  • comprehensive regression output
  • variable selection procedures
  • heteroskedasticiy, collinearity diagnostics and measures of influence
  • various plots and underlying data

If you know how to build models using lm(), you will find olsrr very useful. Most of the functions use an object of class lm as input. So you just need to build a model using lm() and then pass it onto the functions in olsrr. Once you have picked up enough knowledge of R, you can move on to more intuitive approach offered by tidymodels etc. as they offer more flexibility, which olsrr does not.

Installation

# Install release version from CRAN
install.packages("olsrr")

# Install development version from GitHub
# install.packages("devtools")
devtools::install_github("rsquaredacademy/olsrr")

Shiny App

olsrr includes a shiny app which can be launched using

ols_launch_app()

or try the live version here.

Read on to learn more about the features of olsrr, or see the olsrr website for detailed documentation on using the package.

Regression Output

model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_regress(model)
##                         Model Summary                          
## --------------------------------------------------------------
## R                       0.914       RMSE                2.622 
## R-Squared               0.835       Coef. Var          13.051 
## Adj. R-Squared          0.811       MSE                 6.875 
## Pred R-Squared          0.771       MAE                 1.858 
## --------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                 Sum of                                              
##                Squares        DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression     940.412         4        235.103    34.195    0.0000 
## Residual       185.635        27          6.875                     
## Total         1126.047        31                                    
## --------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    27.330         8.639                  3.164    0.004     9.604    45.055 
##        disp     0.003         0.011        0.055     0.248    0.806    -0.019     0.025 
##          hp    -0.019         0.016       -0.212    -1.196    0.242    -0.051     0.013 
##          wt    -4.609         1.266       -0.748    -3.641    0.001    -7.206    -2.012 
##        qsec     0.544         0.466        0.161     1.166    0.254    -0.413     1.501 
## ----------------------------------------------------------------------------------------

In the presence of interaction terms in the model, the predictors are scaled and centered before computing the standardized betas. ols_regress() will detect interaction terms automatically but in case you have created a new variable instead of using the inline function, you can indicate the presence of interaction terms by setting iterm to TRUE.

Residual Diagnostics

olsrr offers tools for detecting violation of standard regression assumptions:

  • Residual QQ plot
  • Residual normality test
  • Residual vs Fitted plot
  • Residual histogram
ols_plot_resid_qq(model)

See Residual Diagnostics for more details.

Heteroskedasticity

olsrr provides the following 4 tests for detecting heteroscedasticity:

  • Bartlett Test
  • Breusch Pagan Test
  • Score Test
  • F Test
ols_test_breusch_pagan(model)
## 
##  Breusch Pagan Test for Heteroskedasticity
##  -----------------------------------------
##  Ho: the variance is constant            
##  Ha: the variance is not constant        
## 
##              Data               
##  -------------------------------
##  Response : mpg 
##  Variables: fitted values of mpg 
## 
##         Test Summary         
##  ----------------------------
##  DF            =    1 
##  Chi2          =    0.5884673 
##  Prob > Chi2   =    0.4430124

See Heteroskedasticity for more details.

Collinearity Diagnostics

VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables.

ols_coll_diag(model)
## Tolerance and Variance Inflation Factor
## ---------------------------------------
## # A tibble: 4 x 3
##   Variables Tolerance   VIF
##   <chr>         <dbl> <dbl>
## 1 disp          0.125  7.99
## 2 hp            0.194  5.17
## 3 wt            0.145  6.92
## 4 qsec          0.319  3.13
## 
## 
## Eigenvalue and Condition Index
## ------------------------------
##    Eigenvalue Condition Index   intercept        disp          hp
## 1 4.721487187        1.000000 0.000123237 0.001132468 0.001413094
## 2 0.216562203        4.669260 0.002617424 0.036811051 0.027751289
## 3 0.050416837        9.677242 0.001656551 0.120881424 0.392366164
## 4 0.010104757       21.616057 0.025805998 0.777260487 0.059594623
## 5 0.001429017       57.480524 0.969796790 0.063914571 0.518874831
##             wt         qsec
## 1 0.0005253393 0.0001277169
## 2 0.0002096014 0.0046789491
## 3 0.0377028008 0.0001952599
## 4 0.7017528428 0.0024577686
## 5 0.2598094157 0.9925403056

See Collinearity Diagnostics for more details.

Measures of Influence

olsrr offers the following tools to detect influential observations:

  • Cook’s D Bar Plot
  • Cook’s D Chart
  • DFBETAs Panel
  • DFFITs Plot
  • Studentized Residual Plot
  • Standardized Residual Chart
  • Studentized Residuals vs Leverage Plot
  • Deleted Studentized Residual vs Fitted Values Plot
  • Hadi Plot
  • Potential Residual Plot
ols_plot_resid_lev(model)

See Measures of Influence for more details.

Variable Selection

Different variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression.

model <- lm(y ~ ., data = stepdata)
ols_step_both_aic(model)
## Stepwise Selection Method 
## -------------------------
## 
## Candidate Terms: 
## 
## 1 . x1 
## 2 . x2 
## 3 . x3 
## 4 . x4 
## 5 . x5 
## 6 . x6 
## 
## 
## Variables Entered/Removed: 
## 
## - x6 added 
## - x1 added 
## - x3 added 
## - x2 added 
## - x6 removed 
## - x4 added 
## 
## No more variables to be added or removed.
## 
## 
##                                   Stepwise Summary                                  
## ----------------------------------------------------------------------------------
## Variable     Method        AIC         RSS        Sum Sq       R-Sq      Adj. R-Sq 
## ----------------------------------------------------------------------------------
## x6          addition    33473.297    6241.497    13986.736    0.69145      0.69143 
## x1          addition    32931.758    6074.156    14154.076    0.69972      0.69969 
## x3          addition    31912.722    5771.842    14456.391    0.71466      0.71462 
## x2          addition    29304.296    5065.587    15162.646    0.74958      0.74953 
## x6          removal     29302.317    5065.592    15162.641    0.74958      0.74954 
## x4          addition    29300.814    5064.705    15163.528    0.74962      0.74957 
## ----------------------------------------------------------------------------------

See Variable Selection for more details.

Learning More

The olsrr website includes comprehensive documentation on using the package, including the following articles that cover various aspects of using olsrr:

  • Variable Selection - Different variable selection procedures such as all possible regression, best subset regression, stepwise regression, stepwise forward regression and stepwise backward regression.

  • Residual Diagnostics - Includes plots to examine residuals to validate OLS assumptions.

  • Heteroskedasticity - Tests for heteroskedasticity include bartlett test, breusch pagan test, score test and f test.

  • Collinearity Diagnostics - VIF, Tolerance and condition indices to detect collinearity and plots for assessing mode fit and contributions of variables.

  • Measures of Influence - Includes 10 different plots to detect and identify influential observations.

Feedback

olsrr has been on CRAN for more than an year while we were fixing bugs and making the API stable. All feedback is welcome. Issues (bugs and feature requests) can be posted to github tracker. For help with code or other related questions, feel free to reach me hebbali.aravind@gmail.com.