Chapter 9 Liking scores

[INTRO NEEDED]

Liking scores is often 1-5, 1-7 or 1-9 [more Bom?]

Most often we want to know if the liking scored significantly different (not just different by chance) depending on the samples. We might also wna to know if other factors have an influence on the liking. e.g. household income. We also want to know what are the actual differences are.

9.1 Plotting liking scores

For the beer data we have the liking in a long matrix. We need to import the data:

library(readxl)
beerliking <- read_excel('DatasetRbook.xlsx',sheet = 'BeerLiking')

[MORTEN: meget simpel kode til fx histogram med sprednigner på - gennemsnit per øl og et plot, der viser fordeling af scores. Altså om de evt. er skævt fordelt per øl. Det kan man ikke se af et gennemsit…]

9.2 Simple mixed models

Mixed models are used when there is repetitions in the response due to (here) the person tasting more than one product.

[MERE TEKST PÅ HER] [Forklar fixed og random effects? Måske bruge dette:Fixed effects are effects that we anticipate have the same direction, e.g., mutual differences between products. Would typically be the same from one experiment to another as the products are unchanging entities. Random effects are effects that we cannot predict, e.g., mutual differences between consumers may differ from one experiment to another as consumers are affected by various emotional, environmental, physiological or other influences in their lives]

library(lmerTest)
## Loading required package: lme4
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Attaching package: 'lme4'
## The following object is masked from 'package:RVAideMemoire':
## 
##     dummy
## 
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
## 
##     lmer
## The following object is masked from 'package:stats':
## 
##     step
mdl <- lmer(data = beerliking, Liking ~ Beer + (1|Consumer.ID)) 
anova(mdl)
## Type III Analysis of Variance Table with Satterthwaite's method
##      Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
## Beer 110.52  22.105     5 763.12  8.3054 1.175e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

[MORTEN: Sensorikere er virkelig glade for p-værdier for en variabel - det kan jeg ikke se, jeg kan få ud med lme4, derfor foreslår jeg pakken lmerTest i stedet for. Lavet af Per Brockhoff til sensorikdata. og så også anova() i stedet for summary(). Så kommer der een overordnet p-værdi ud ]

To explain the model: take the dataset called beerliking and calculate a model where Beer is the fixed effect and the consumer is the random effect for the response variable Liking. Use the function lmer and save the output as mdl. The anova() function will provide you with the p value(s). Remember you choose your own title of your model.

[MANGLER: forklaring på output; At least two of the products by name are scored significantly different for the liking.]

9.2.1 Post hoc contrasts

[HEDDER DET DET? korrekt titel?] [SKAL SKRIVES DET HELE]

To calculate pairwise comparisons between e.g. samples and find letter-based representation you need a the package multcomp.

library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
cld(glht(mdl, linfct = mcp(Beer = "Tukey")))
##     Brown Ale      NY Lager    Porse Bock Ravnsborg Red    River Beer 
##           "c"          "ab"           "a"          "bc"           "a" 
##     Wheat IPA 
##           "a"

[MANGLER: forklaring på model og output]

The letters are based on the Post Hoc test Tukey, and they are calculated on the basis of the model mdl. Samples with the same letters are not significantly different. You can only use this function on factors.

[MORTEN: Hvis man nu har kontinuerte variable, hvordan fortolkes det så?]

9.3 Multiway mixed models

[explain + output + interpret]

In multiway models, we start with the largest possible, i.e., including all relevant explanatory variables (e.g., including the product variable). We then have to eliminate non-significant variables manually one by one (backwards step-wise elimination). The final model will only contain the significant variables.

In the beer dataset it could be nice to calculate if the liking is affected by the gender and age of the consumer.

library(lmerTest)
mdl.2 <- lmer(data = beerliking, Liking ~ Beer + Gender + Age + (1|Consumer.ID)) 
summary(mdl.2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Liking ~ Beer + Gender + Age + (1 | Consumer.ID)
##    Data: beerliking
## 
## REML criterion at convergence: 3600.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.41056 -0.72755  0.08595  0.74181  2.07687 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Consumer.ID (Intercept) 0.3144   0.5607  
##  Residual                2.6621   1.6316  
## Number of obs: 920, groups:  Consumer.ID, 155
## 
## Fixed effects:
##                   Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         4.5904     0.1954 357.9021  23.487  < 2e-16 ***
## BeerNY Lager       -0.6437     0.1869 763.4365  -3.443 0.000606 ***
## BeerPorse Bock     -0.8393     0.1866 762.7991  -4.498 7.94e-06 ***
## BeerRavnsborg Red  -0.2354     0.1866 762.7991  -1.261 0.207542    
## BeerRiver Beer     -0.8378     0.1867 764.6555  -4.488 8.30e-06 ***
## BeerWheat IPA      -0.9433     0.1869 763.5511  -5.046 5.63e-07 ***
## GenderM             0.1070     0.1448 148.5136   0.739 0.461235    
## Age26 - 35          0.3992     0.2329 149.8802   1.714 0.088585 .  
## Age36 - 45          0.4326     0.2521 146.4248   1.716 0.088203 .  
## Age46 - 55          0.5538     0.2145 149.6847   2.582 0.010785 *  
## Age56 - 65          0.2041     0.2040 147.5124   1.001 0.318700    
## Age66 - 75          0.2173     0.2730 146.4260   0.796 0.427381    
## AgeOver 75          0.2224     0.5953 173.5838   0.374 0.709194    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

[MANGLER: FORTOLKNING AF OUTPUT]

It could be ice to calculate if the liking of specific sample are affected by the age, so we include the interaction between Beer and Age:

mdl.3 <- lmer(data = beerliking, Liking ~ Beer + Beer*Age + (1|Consumer.ID)) 
summary(mdl.2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Liking ~ Beer + Gender + Age + (1 | Consumer.ID)
##    Data: beerliking
## 
## REML criterion at convergence: 3600.9
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.41056 -0.72755  0.08595  0.74181  2.07687 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  Consumer.ID (Intercept) 0.3144   0.5607  
##  Residual                2.6621   1.6316  
## Number of obs: 920, groups:  Consumer.ID, 155
## 
## Fixed effects:
##                   Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)         4.5904     0.1954 357.9021  23.487  < 2e-16 ***
## BeerNY Lager       -0.6437     0.1869 763.4365  -3.443 0.000606 ***
## BeerPorse Bock     -0.8393     0.1866 762.7991  -4.498 7.94e-06 ***
## BeerRavnsborg Red  -0.2354     0.1866 762.7991  -1.261 0.207542    
## BeerRiver Beer     -0.8378     0.1867 764.6555  -4.488 8.30e-06 ***
## BeerWheat IPA      -0.9433     0.1869 763.5511  -5.046 5.63e-07 ***
## GenderM             0.1070     0.1448 148.5136   0.739 0.461235    
## Age26 - 35          0.3992     0.2329 149.8802   1.714 0.088585 .  
## Age36 - 45          0.4326     0.2521 146.4248   1.716 0.088203 .  
## Age46 - 55          0.5538     0.2145 149.6847   2.582 0.010785 *  
## Age56 - 65          0.2041     0.2040 147.5124   1.001 0.318700    
## Age66 - 75          0.2173     0.2730 146.4260   0.796 0.427381    
## AgeOver 75          0.2224     0.5953 173.5838   0.374 0.709194    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation matrix not shown by default, as p = 13 > 12.
## Use print(x, correlation=TRUE)  or
##     vcov(x)        if you need it

Remember you choose your own “name” for the model “+” between variables defines (additive) main effects “:” between variables defines an interaction “*” between variables defines a parameterization with both interaction and main effect terms

For model selection here, you start by removing the interaction with the hight p-value and then recalculate the model. You cannot remove a single variable if an interaction including it is significant.

[MANGLER: FORTOLKNING AF OUTPUT]