Chapter 24 Linear models

Linear models is a general term for models with a single univariate response (dependent variable - \(y\) in the formula below), which we want to describe using one or several predictors (independent variables - \(x\) in the formula below).

\[ y = a + b \cdot x + e \] Here the informative parameter is the slope (\(b\)) which indicates the relation between \(x\) and \(y\). (\(e\) is the missfit / residuals of the model).

We use tidyverse coding as this makes life much easier. As a tidyverse add on, we use broom for the linear models, broom.mixed and lme4 for the linear mixed models.

library(ggplot2) # lets plot dadta
library(tidyverse)
library(broom)
library(broom.mixed)
library(lme4)
library(ggpubr) # lets add stats to the plots
library(knitr) # lets make nice tables

The data is already imported, and formated (see Getting_data_in.pdf for details). We simply load this file.

library(data4consumerscience)
data(pasta)

24.1 Example

As response variable, the amount of Consumption of Pasta with mushrooms and use the likert scale I like the taste of pasta with mushrooms! as predictor. We use ONLY Day 1 results.

First a plot:

pasta %>% 
  filter(str_detect(StationName,'mush')) %>% 
  filter(Day==1) %>% 
  ggplot(data = ., aes(I_like_taste_of_pasta_with_mushrooms,Consumption)) + 
  geom_point() + 
  stat_smooth(method = lm, se = F)

It seems as there is something. So lets build a linear model on this

# subset the data
x <- pasta %>% 
  filter(str_detect(StationName,'mush')) %>% 
  filter(Day==1) 

mdl <- lm(data = x, Consumption~I_like_taste_of_pasta_with_mushrooms) 
mdl  
## 
## Call:
## lm(formula = Consumption ~ I_like_taste_of_pasta_with_mushrooms, 
##     data = x)
## 
## Coefficients:
##                          (Intercept)  I_like_taste_of_pasta_with_mushrooms  
##                              -186.28                                 50.21
summary(mdl)
## 
## Call:
## lm(formula = Consumption ~ I_like_taste_of_pasta_with_mushrooms, 
##     data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -113.19  -68.98   12.81   35.02  170.81 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)
## (Intercept)                           -186.28     195.41  -0.953    0.358
## I_like_taste_of_pasta_with_mushrooms    50.21      30.35   1.655    0.122
## 
## Residual standard error: 83.66 on 13 degrees of freedom
## Multiple R-squared:  0.174,  Adjusted R-squared:  0.1104 
## F-statistic: 2.738 on 1 and 13 DF,  p-value: 0.1219

The the slope indicates that by increasing liking by one unit the consumption increase is \(50.2 \pm 30.4\), however, this apparent effect is not statistically significant (\(p = 0.12\)).

24.2 Run a bunch of models at once

We want to model consumption of both pasta with mushrooms andd legumes, and look at all the likert scales questions as predictors. Further we want to do this for both days.

First we create a new long format data frame

pastalong <- pasta %>% 
  gather(question,answ,I_like_taste_of_pasta_with_legumes:Pasta_with_mushrooms_is_visually_appealing) %>% 
  mutate(answnum = factor(answ,labels = c('Disagree','More or less disagree','Neither agree nor disagree','More or less agree','Agree','Strongly agree')) %>% as.numeric())

24.2.1 A plot

pastalong %>% 
  filter(!is.na(StationName )) %>% 
  mutate(question2 = question %>% substr(1,34)) %>% # The label is to long, so lets just represent the first 30 letters.
  ggplot(data = .,  aes(answnum,Consumption, color = factor(Day))) +
  geom_point() + 
  stat_smooth(se = F, method = lm) + 
  stat_cor() +
  facet_grid(question2 ~ StationName) + 
  theme_bw() + theme(legend.position = 'bottom')

This we similary can run as several linear models.

tb <- pastalong %>% 
  filter(!is.na(StationName )) %>% 
  group_by(StationName,question,Day) %>% 
  do(lm(data = ., Consumption~answnum) %>% tidy(conf.int = T))

tb %>% 
  filter(term=='answnum') %>% 
  dplyr::select(-statistic) %>% 
kable(x = .,caption = 'All linear models', digits = 2, format = 'simple')
Table 24.1: All linear models
StationName question Day term estimate std.error p.value conf.low conf.high
Pasta with legumes I_like_taste_of_pasta_with_legumes 1 answnum 28.79 13.48 0.05 -0.33 57.91
Pasta with legumes I_like_taste_of_pasta_with_legumes 2 answnum 39.55 14.94 0.02 7.27 71.83
Pasta with legumes I_like_taste_of_pasta_with_mushrooms 1 answnum 16.63 30.22 0.59 -48.66 81.92
Pasta with legumes I_like_taste_of_pasta_with_mushrooms 2 answnum 55.27 36.68 0.16 -23.97 134.52
Pasta with legumes Pasta_with_legumes_is_visually_appealing 1 answnum 24.01 12.43 0.08 -2.84 50.85
Pasta with legumes Pasta_with_legumes_is_visually_appealing 2 answnum 15.07 20.01 0.46 -28.16 58.30
Pasta with legumes Pasta_with_mushrooms_is_visually_appealing 1 answnum 11.05 21.12 0.61 -34.57 56.67
Pasta with legumes Pasta_with_mushrooms_is_visually_appealing 2 answnum -3.17 31.07 0.92 -70.28 63.95
Pasta with mushroom I_like_taste_of_pasta_with_legumes 1 answnum 21.16 16.07 0.21 -13.57 55.89
Pasta with mushroom I_like_taste_of_pasta_with_legumes 2 answnum -10.36 12.41 0.42 -37.16 16.44
Pasta with mushroom I_like_taste_of_pasta_with_mushrooms 1 answnum 50.21 30.35 0.12 -15.35 115.77
Pasta with mushroom I_like_taste_of_pasta_with_mushrooms 2 answnum 11.36 27.13 0.68 -47.25 69.98
Pasta with mushroom Pasta_with_legumes_is_visually_appealing 1 answnum 23.94 13.89 0.11 -6.07 53.95
Pasta with mushroom Pasta_with_legumes_is_visually_appealing 2 answnum -9.25 13.81 0.51 -39.08 20.58
Pasta with mushroom Pasta_with_mushrooms_is_visually_appealing 1 answnum 21.77 22.51 0.35 -26.86 70.40
Pasta with mushroom Pasta_with_mushrooms_is_visually_appealing 2 answnum -12.67 21.06 0.56 -58.17 32.83
.. A plot of these res ults for a quick interpretation.
tb %>% 
  filter(term=='answnum') %>% 
  ggplot(data = ., aes(question,estimate,ymin = conf.low, ymax = conf.high, color = factor(Day))) + 
  geom_errorbar(width = 0.1, position = position_dodge()) +geom_point()+ 
  geom_hline(yintercept = 0) + 
  coord_flip() +facet_grid(~StationName) + 
  theme(legend.position = 'bottom')

Seems as some of the legume consumptions there is a significant association with likert scales. Not as strong for consumption of mushrooms.