Chapter 17 TFIH Exercises

Below you will find exercises relevant for the data analysis for the project work in the Thematic course in Food Innovation and Health. Exercises are meant as a guide on what you can do with your own data obtained through the practical work on the projects.

Before you start to work on the exercises, you need to familiarize yourself with the knowledge in the introduction chapters of this book as well as the chapters written specifically of for this course.

17.1 Exercise 1: Descriptive statistics and plots

From this exercise you should be able to describe the products in terms of common, rare and differential attributes, as well as the hedonic ratings.

17.1.1 CATA counts

Create data set with CATA counts per product and make a table thereof

Plot the CATA counts such that product differences are emphasized.

Which characteristics is most pronounced overall? And where are the largest product differences observed?

library(data4consumerscience)
data("beercata")

17.1.2 Hedonics

Plot hedonic ratings (beerliking$Liking) per product in histograms to see the distribution

Calculate descriptive statistics for hedonic rating for each product.

Which beer is in general more likable? Is the disagreement in liking similar across beers?

data("beerliking")

17.2 Exercise 2: Consumer background and PCA

From this exercise you should be able to describe who your consumers are.

17.2.1 Demographics

Age, Gender, Income and Household size is classical demographic characteristics. As the evaluation of the products is done using a specific group of people, the generalizability of the results naturally condition of the characteristics of this population.

Construct the distributions of the demographic characteristics. You can use the table() function, or maybe do it all at once using the tableone package. For inspiration, modify the code below to serve your needs.

Is there any demographic bias in this dataset?

library(data4consumerscience)
library(tableone)
data("beerdemo")

CreateTableOne(vars = c('Age','Gender'),data = beerdemo)

17.2.2 PCA on Interests

Calculate a PCA model including the Variables 7 ( Interest in food ) to 12 ( Future interest in beer ). Remember to standardize/scale the variables

mdlPCA <- prcomp(beerdemo[,7:12],scale. = T)

Plot the scores and loadings in a biplot and look for groupings of the consumers in the scores.

Group and color according to the background information not used in the model (Gender, Age,..)

library(ggbiplot)
ggbiplot(mdlPCA, groups = beerdemo$Gender, ellipse = T)

Describe what you find: For instance, is the loading making sense?

17.3 Exercise 3: PCA on CATA counts

From this exercise you should be able to describe your samples (beers) from the summed CATA counts (see the Chapter on “PCA on CATA data”), and which attributes that occur together, as well as which attribute patterns that discriminate products.

Calculate a PCA model including all Variables and all Objects.

beercatasum <- beercata %>% 
  pivot_longer(names_to = 'attrib',values_to = 'val', cols = S_Flowers:S_Vinous) %>% 
  group_by(Beer,attrib) %>%
  summarize(n = sum(val)) %>% 
  pivot_wider(names_from = attrib, values_from = n)
PCAmdl <- prcomp(beercatasum[,-1], scale. = T)

Plot the scores and describe the groupings of the samples.

Plot the loadings and describe the correlations between the variables.

Both of these plots are present in a biplot, but you may want to play around with the settings (such as labels.size, varname.size, varname.adjust, …) to get all information out.

ggbiplot(PCAmdl, labels = beercatasum$Beer)

Use this biplot to find out which samples are described by which CATA attributes. as well as similarity of products.

17.4 Exercise 4: Cochran’s Q test on CATA binary data

In Exercise 1 the attributes characteristic for certain products, were summarized using descriptive stats and plots. Now we want to add statistical inference onto these observations.

To get started, choose a few attributes. For each attribute 1) Perform Cochran’s Q test for overall difference, and if interesting enough 2) perform a Post hoc test for product segmentation.

To get all attributes analyzed collectively, use the tidyverse code in the chapter For all attributes in one run (nice to know).

Which descriptors are most different between products? Are the overall most pronounced descriptors also the ones most different?

Try this procedure out on a different dataset, e.g. the tempeto data.

data("tempetofermentation")
library(RVAideMemoire)
tempetofermentation$Product2 <- factor(tempetofermentation$Product) # needs to be a factor for getting labels on results.
m <- cochran.qtest(Sour ~ Product2 | Assessor,
                   data = tempetofermentation)

17.5 Exercise 5: Hedonic ratings and consumer characteristics

From this exercise you should be able to describe the liking of the beer samples and glimpse into consumer characteristics related to liking

17.5.1 PCA on joint data

Calculate a PCA model including liking for all beers.

beerliking_wider <- beerliking %>% 
  pivot_wider(names_from = Beer,values_from = Liking) %>% # make into wide format
  drop_na()

PCAliking <- prcomp(beerliking_wider[,13:18], scale. = T)

Plot a biplot or loading plot, and use the loadings and describe the correlations between the variables (liking of beers in this case).

ggbiplot(PCAliking)

Plot the scores and describe the groupings of the samples by colouring the score plot according to the consumer background variables.

ggbiplot(PCAliking,groups = beerliking_wider$Age, ellipse = T)

Any trends? For instance, how is liking related to the individual consumer diversity of beer ( Beer types/month)?

17.5.2 All demographics

… Some code to get all 7-scale demo information plots. You may want to export and view in a pdf viewer for zooming etc.

gall <- cbind(PCAliking$x[,1:2], beerliking_wider) %>% 
  pivot_longer(names_to = 'var',values_to = 'val',cols = `Interest in food`:`Future interest in beer`) %>% 
  ggplot(data = ., aes(PC1,PC2, color = factor(val))) + 
  geom_point() + 
  stat_ellipse() + 
  facet_wrap(~var)
ggsave(filename = 'anicebigfigure.pdf',gall, height = 20, width = 20)

17.6 Exercise 6: PCA on CATA counts and hedonic ratings

From this exercise you should be able to conclude what attributes that overall drives the liking of products (beers).

For each beer, the collated CATA counts and the averaged liking constitutes the multivariate response matrix.

likingsum <- beerliking %>%
  group_by(Beer) %>% 
  summarize(Liking = mean(Liking, na.rm = T))

catasum <- beercata %>% 
  pivot_longer(names_to = 'attrib',values_to = 'val', cols = S_Flowers:S_Vinous) %>% 
  group_by(Beer,attrib) %>%
  summarize(n = sum(val, na.rm = T)) %>% 
  pivot_wider(names_from = attrib, values_from = n)

Join the two.

cata_like <- catasum %>% 
  left_join(likingsum, by = 'Beer')

Do a PCA

PCAcata_like <- prcomp(cata_like[,-1],scale. = T)
ggbiplot(PCAcata_like, labels = cata_like$Beer)

What charaterizes River Beer ? Which descriptors are in general associated with liking? (and dis-liking?)

17.7 Exercise 7: Mixed modelling on hedonic ratings

In this exercise we want to model Liking as a response. We want to use the beer type as the main predictor, but take into account that the responses is repeated within consumers. The same judges have evaluated all beers. This is a mixed model

Start with a plot

ggplot(data = beerliking, aes(Beer, Liking)) + 
  geom_boxplot()

library(lme4)
library(lmerTest)
lik_mdl <- lmer(data = beerliking, Liking ~ Beer + (1|Consumer.ID))

Evaluate this model by anova() and summary(), and interpret the result. How much within individual consumer variation is there compared to between consumer?

You should get that the beers are not equally likable. But are there maybe some which are more similar? Perform a all-pairs post hoc test and try to group beers based on liking.

library(multcomp)
summary(glht(lik_mdl, linfct = mcp(Beer = "Tukey")), 
        test = adjusted("fdr"))

Try to add consumer demographics and interest to the model to evaluate general consumer patterns as well as interactions with beer type.

Try to answer questions such as:

  • Are there any significant product differences for the liking? If so, what does the post hoc test tell us? How does this fit with what you have done in the PCA exercises.
  • Is the liking in general affected by the age, gender, household size or beer knowledge? What is the effect? Try to think of a plot that can show the significant differences.
  • Do men and women score the samples significantly different in liking? Calculate the sample/gender differences in averages.