SIMPLE PARAMETRIC TESTS

Erik Kusch

erik.kusch@i-solution.de

Section for Ecoinformatics & Biodiversity

Center for Biodiversity and Dynamics in a Changing World (BIOCHANGE)

Aarhus University

Aarhus University Biostatistics - Why? What? How? 1 / 27

1 Background

2 Analyses

t-Test (unpaired)

t-Test (paired)

Analysis of Variance (ANOVA)

One-Way ANOVA

Two-Way ANOVA

ANCOVA

3 Our Data

Choice Of Variables

Research Questions

Aarhus University Biostatistics - Why? What? How? 2 / 27

Background

Introduction

Parametric test are those statistical approaches which rely on assumptions

about the parameters which deﬁne a population.

Prominent parametric tests include:

Pearson correlation (Seminar 9 - Correlation Tests)

t-Test

Analysis Of Variance (ANOVA)

Linear regression

Multivariate extensions of parametric methods

...

Aarhus University Biostatistics - Why? What? How? 4 / 27

Background

Terminology

A reminder about the distinction of parametric and non-parametric tests (taken

from Seminar 6):

Non-Parametric Tests

Less restrictive

Make little to no assumptions

Often a black box

Require more data

Parametric Tests

More restrictive

Make strict assumptions

Easy to interpret

Require less data

→ Parametric tests are numerous!

Aarhus University Biostatistics - Why? What? How? 5 / 27

Analyses t-Test (unpaired)

Purpose And Assumptions

t-Test (unpaired)

t.test(..., paired = FALSE) in base R

Pur pose:

To identify whether groups of variable values are different from one

another.

There is no difference in characteristics of the response variable

values in dependence of the classes of the predictor variable.

Assumptions:

Predictor variable is binary

Response variable is metric and normal distributed within their

groups

Variable values are independent (not paired)

→ Test whether variance of response variable values in groups are equal (var.test()) and adjust t.test()

argument var.equal accordingly.

Aarhus University Biostatistics - Why? What? How? 7 / 27

Analyses t-Test (unpaired)

Minimal Working Example

Let’s feed data to our t.test(..., paired = FALSE) function that holds two

groups with clearly differing means:

data <- c(rnorm(10, 5, 1), rnorm(10, 10, 1))

factors <- as.factor(rep(c("A", "B"), each = 10))

t.test(data ~ factors, paired = FALSE)

## Welch Two Sample t-test

## data: data by factors

## t = -12, df = 14, p-value = 1e-08

## alternative hypothesis: true difference in means is not equal to 0

## 95 percent confidence interval:

## -6.4 -4.4

## sample estimates:

## mean in group A mean in group B

## 4.4 9.8

The output above tells us that the means of our two groups are signiﬁcantly different.

Aarhus University Biostatistics - Why? What? How? 8 / 27

Analyses t-Test (paired)

Purpose And Assumptions

t-Test (paired)

t.test(..., paired = TRUE) in base R

Pur pose:

To identify whether groups of variable values are different from one

another.

There is no difference in characteristics of the response variable

values in dependence of the classes of the predictor variable.

Assumptions:

Predictor variable is binary

Response variable is metric

Difference of response variable pairs is normal distributed

Variable values are dependent (paired)

→ Test whether variance of response variable values in groups are equal (var.test()) and adjust t.test()

argument var.equal accordingly.

Aarhus University Biostatistics - Why? What? How? 9 / 27

Analyses t-Test (paired)

Minimal Working Example

Let’s feed data to our t.test(..., paired = TRUE) function that holds two

connected groups with clearly differing means:

data <- c(rnorm(10, 5, 1), rnorm(10, 10, 1))

factors <- as.factor(rep(c("A", "B"), each = 10))

t.test(data ~ factors, paired = TRUE)

## Paired t-test

## data: data by factors

## t = -10, df = 9, p-value = 3e-06

## alternative hypothesis: true difference in means is not equal to 0

## 95 percent confidence interval:

## -5.7 -3.7

## sample estimates:

## mean of the differences

## -4.7

The output above tells us that the means of our two connected groups are signiﬁcantly

different.

Aarhus University Biostatistics - Why? What? How? 10 / 27

Analyses Analysis of Variance (ANOVA)

Introduction to ANOVA

ANOVAs are used to test whether there is a difference between groups of

variable values.

There are multiple versions of ANOVAs:

One-way ANOVA (one predictor variable)

Two-Way ANOVA (multiple predictor variables)

MANOVA (multivariate ANOVA/multiple response variables)

ANCOVA (categorical and continuous predictor variables)

MANCOVA (multivariate ANCOVA)

Aarhus University Biostatistics - Why? What? How? 11 / 27

Analyses Analysis of Variance (ANOVA)

Data for ANOVA

We will use the crabs data set from the MASS package

library(MASS)

data(crabs)

head(crabs)

## sp sex index FL RW CL CW BD

## 1 B M 1 8.1 6.7 16 19 7.0

## 2 B M 2 8.8 7.7 18 21 7.4

## 3 B M 3 9.2 7.8 19 22 7.7

## 4 B M 4 9.6 7.9 20 23 8.2

## 5 B M 5 9.8 8.0 20 23 8.2

## 6 B M 6 10.8 9.0 23 26 9.8

Aarhus University Biostatistics - Why? What? How? 12 / 27

Analyses One-Way ANOVA

Purpose And Assumptions

One-Way ANOVA

anova() in base R

Pur pose:

To explain the variance of a continuous response variable in relation to

one predictor variables.

Variance of response variable values is equal between levels of

predictor variable.

Assumptions:

Predictor variable is categorical

Response variable is metric

Response variable residuals are normal distributed

Variance of populations/samples are equal (homogeneity)

Variable values are independent (not paired)

→ Test whether residuals are normal distributed with shapiro.test() in base R, test for homogeneity with

leveneTest() in the car package.

Aarhus University Biostatistics - Why? What? How? 13 / 27

Analyses One-Way ANOVA

Minimal Working Example - Assumptions

Let’s test whether body depth (BD) of crabs are varying when grouped by sex:

OneWay <- with(crabs, lm(BD ~ sex)) # MODEL

plot(OneWay, 2)# Normality

−3 −2 −1 0 1 2 3

−2 −1 0 1 2

Theoretical Quantiles

Standardized residuals

lm(BD ~ sex)

Normal Q−Q

200

shapiro.test(residuals(OneWay))

## Shapiro-Wilk normality test

## data: residuals(OneWay)

## W = 1, p-value = 0.2

plot(OneWay, 3)# Homogeneity

13.7 13.8 13.9 14.0 14.1 14.2 14.3

0.0 0.5 1.0 1.5

Fitted values

Standardized residuals

lm(BD ~ sex)

Scale−Location

200

library("car")

leveneTest(BD ~ sex, data = crabs)

## Levene's Test for Homogeneity of Variance (center = median)

## Df F value Pr(>F)

## group 1 0.36 0.55

## 198

All good on the assumption check!

Aarhus University Biostatistics - Why? What? How? 14 / 27

Analyses One-Way ANOVA

Minimal Working Example - Analysis

Now let’s run the analysis:

anova(OneWay)

## Analysis of Variance Table

## Response: BD

## Df Sum Sq Mean Sq F value Pr(>F)

## sex 1 19 18.8 1.61 0.21

## Residuals 198 2315 11.7

As we can see, sex does not make for a statistically signiﬁcant predictor of crab

body depth.

Aarhus University Biostatistics - Why? What? How? 15 / 27

Analyses One-Way ANOVA

Minimal Working Example - Interpretation

Let’s interpret the result anyways:

summary(OneWay)

## Call:

## lm(formula = BD ~ sex)

## Residuals:

## Min 1Q Median 3Q Max

## -7.624 -2.449 0.076 2.463 7.376

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 13.724 0.342 40.13 <2e-16

***

## sexM 0.613 0.484 1.27 0.21

## ---

## Signif. codes:

## 0 '

***

' 0.001 '

' 0.01 '

' 0.05 '.' 0.1 ' ' 1

## Residual standard error: 3.4 on 198 degrees of freedom

## Multiple R-squared: 0.00805, Adjusted R-squared: 0.00304

## F-statistic: 1.61 on 1 and 198 DF, p-value: 0.206

Female crabs are estimated to have a body depth of 13.72cm (Intercept) with males being 0.61cm

bigger, on average.

While we can be certain of the female estimate, we cannot say the same about the different to males.

Aarhus University Biostatistics - Why? What? How? 16 / 27

Analyses Two-Way ANOVA

Purpose And Assumptions

Two-Way ANOVA

anova() in base R

Pur pose:

To explain the variance of a continuous response variable in relation to

multiple predictor variables.

Variance of response variable values is equal between levels of

predictor variables.

Assumptions:

Predictor variables are categorical

Response variable is metric

Response variable residuals are normal distributed

Variance of populations/samples are equal (homogeneity)

Variable values are independent (not paired)

→ Test whether residuals are normal distributed with shapiro.test() in base R, test for homogeneity with

leveneTest() in the car package.

Aarhus University Biostatistics - Why? What? How? 17 / 27

Analyses Two-Way ANOVA

Minimal Working Example - Assumptions

Let’s test whether body depth (BD) of crabs are varying when grouped by sex

and species as well as their interaction:

TwoWay <- with(crabs, lm(BD ~ sex

sp))

plot(TwoWay, 2)# Normality

−3 −2 −1 0 1 2 3

−2 −1 0 1 2

Theoretical Quantiles

Standardized residuals

lm(BD ~ sex * sp)

Normal Q−Q

101

shapiro.test(residuals(TwoWay))

## Shapiro-Wilk normality test

## data: residuals(TwoWay)

## W = 1, p-value = 0.2

plot(TwoWay, 3)# Homogeneity

12 13 14 15

0.0 0.5 1.0 1.5

Fitted values

Standardized residuals

lm(BD ~ sex * sp)

Scale−Location

101

library("car")

leveneTest(BD ~ sex

sp, data = crabs)

## Levene's Test for Homogeneity of Variance (center = median)

## Df F value Pr(>F)

## group 3 2.02 0.11

## 196

All good on the assumption check!

Aarhus University Biostatistics - Why? What? How? 18 / 27

Analyses Two-Way ANOVA

Minimal Working Example - Analysis

Now let’s run the analysis:

anova(TwoWay)

## Analysis of Variance Table

## Response: BD

## Df Sum Sq Mean Sq F value Pr(>F)

## sex 1 19 19 1.99 0.160

## sp 1 419 419 44.31 2.8e-10

***

## sex:sp 1 42 42 4.48 0.035

## Residuals 196 1854 9

## ---

## Signif. codes:

## 0 '

***

' 0.001 '

' 0.01 '

' 0.05 '.' 0.1 ' ' 1

The output above tells us that species and the interaction effect of sex and

species are meaningful for understanding body depth of crabs.

Aarhus University Biostatistics - Why? What? How? 19 / 27

Analyses Two-Way ANOVA

Minimal Working Example - Interpretation

summary(TwoWay)

## Call:

## lm(formula = BD ~ sex

sp)

## Residuals:

## Min 1Q Median 3Q Max

## -7.924 -2.224 0.059 2.250 6.650

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 11.816 0.435 27.17 < 2e-16

***

## sexM 1.534 0.615 2.49 0.013

## spO 3.816 0.615 6.20 3.2e-09

***

## sexM:spO -1.842 0.870 -2.12 0.035

## ---

## Signif. codes:

## 0 '

***

' 0.001 '

' 0.01 '

' 0.05 '.' 0.1 ' ' 1

## Residual standard error: 3.1 on 196 degrees of freedom

## Multiple R-squared: 0.206, Adjusted R-squared: 0.194

## F-statistic: 16.9 on 3 and 196 DF, p-value: 8.13e-10

Female crabs of species B are estimated to have a body depth of 11.82cm (Intercept) with males of species B being 1.53cm bigger, on average.

Female crabs of species O are estimated to have a body depth of 3.82cm bigger than their female species B counterparts.

The difference in sex- vs. species-dependant change in body depth is -1.84cm.

All estimates are statistically signiﬁcant.

Aarhus University Biostatistics - Why? What? How? 20 / 27

Analyses ANCOVA

Purpose And Assumptions

ANCOVA

anova() in base R

Pur pose:

To explain the variance of a continuous response variable in relation to

mixed (continuous and categorical) predictor variables.

Adjusted variance and means of response variable values is equal

between levels of predictor variables.

Assumptions:

Predictor variables are categorical or continuous

Response variable is metric

Response variable residuals are normal distributed

Variance of populations/samples are equal (homogeneity)

Variable values are independent (not paired)

Relationship between the response and covariate is linear.

→ Test whether residuals are normal distributed with shapiro.test() in base R, test for homogeneity with leveneTest() in the car package.

Aarhus University Biostatistics - Why? What? How? 21 / 27

Analyses ANCOVA

Minimal Working Example - Assumptions

Let’s test whether carapace length (CL) of crabs are varying when grouped by

species and the carapace width as a covariate:

Ancova <- with(crabs, lm(CL ~ sp

CW))

plot(Ancova, 2)# Normality

−3 −2 −1 0 1 2 3

Theoretical Quantiles

Standardized residuals

lm(CL ~ sp * CW)

Normal Q−Q

188

145

shapiro.test(residuals(Ancova))

## Shapiro-Wilk normality test

## data: residuals(Ancova)

## W = 1, p-value = 0.2

plot(Ancova, 3)# Homogeneity

15 20 25 30 35 40 45

0.0 0.5 1.0 1.5

Fitted values

Standardized residuals

lm(CL ~ sp * CW)

Scale−Location

188

145

library("car")

leveneTest(CL ~ sp, data = crabs)

## Levene's Test for Homogeneity of Variance (center = median)

## Df F value Pr(>F)

## group 1 0.1 0.75

## 198

Assumptions are met!

Aarhus University Biostatistics - Why? What? How? 22 / 27

Analyses ANCOVA

Minimal Working Example - Analysis

Now let’s run the analysis:

anova(Ancova)

## Analysis of Variance Table

## Response: CL

## Df Sum Sq Mean Sq F value Pr(>F)

## sp 1 838 838 3868.20 <2e-16

***

## CW 1 9203 9203 42460.12 <2e-16

***

## sp:CW 1 1 1 4.29 0.04

## Residuals 196 42 0

## ---

## Signif. codes:

## 0 '

***

' 0.001 '

' 0.01 '

' 0.05 '.' 0.1 ' ' 1

The output above tells us that all of our model coefﬁcients are signiﬁcant.

Aarhus University Biostatistics - Why? What? How? 23 / 27

Analyses ANCOVA

Minimal Working Example - Interpretation

summary(Ancova)

## Call:

## lm(formula = CL ~ sp

CW)

## Residuals:

## Min 1Q Median 3Q Max

## -1.4634 -0.2611 -0.0041 0.2907 1.1861

## Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -0.36442 0.21170 -1.72 0.087 .

## spO 0.44111 0.32079 1.38 0.171

## CW 0.87630 0.00595 147.31 <2e-16

***

## spO:CW 0.01781 0.00860 2.07 0.040

## ---

## Signif. codes:

## 0 '

***

' 0.001 '

' 0.01 '

' 0.05 '.' 0.1 ' ' 1

## Residual standard error: 0.47 on 196 degrees of freedom

## Multiple R-squared: 0.996, Adjusted R-squared: 0.996

## F-statistic: 1.54e+04 on 3 and 196 DF, p-value: <2e-16

Crabs of species B have an estimated carapace length of -0.36cm when their carpace width would be 0cm (Intercept) with members of species B

being 0.44cm bigger, on average at 0cm carapace width.

For each additional cm in carapace width, carapacae length in species B increases by 0.88cm.

For each additional cm in carapace width, carapacae length in species O increases by 0.88cm more than in species B.

All estimates except for the species-difference are statistically signiﬁcant.

Aarhus University Biostatistics - Why? What? How? 24 / 27

Our Data Choice Of Variables

Variables We Can Use

Response variables (metric)

Weight

Height

Wing Chord

Nesting Height

Number of Eggs

Egg Weight

Predictor variables (categorical)

Sex (binary)

Climate (binary)

Climate (3 levels - Continental,

Semi-Coastal, Coastal)

Home Range (3 levels - Small,

Medium, Large)

Site Index (11 levels)

Predator Presence/Type (3 levels -

Avian vs. Non-Avian vs. None)

Aarhus University Biostatistics - Why? What? How? 26 / 27

Our Data Research Questions

Research Questions And Hypotheses

So which of our major research questions (seminar 6) can we answer?

unpaired t-Test

Climate Warming/Extremes: Does sparrow

morphology change depend on climate?

Sexual Dimorphism: Does sparrow morphology

change depend on Sex?

Use the 1 - Sparrow_Data_READY.rds data set for these analyses.

paired t-Test (suppose a resettling program)

Climate Warming/Extremes: Does sparrow

morphology change depend on climate?

Use the 2b - Sparrow_ResettledSIUK_READY.rds data set for these

analyses.

One-Way ANOVA

Climate Warming/Extremes: Does sparrow

morphology depend on climate?

Predation: Does nesting height depend on

predator characteristics?

Two-Way ANOVA

Sexual Dimorphism: Does sparrow morphology

depend on population status and sex?

ANCOVA

Climate Warming/Extremes: Do sparrow

characteristics depend on climate and latitude?

Use the 1 - Sparrow_Data_READY.rds data set for these analyses.

Remember to diligently check assumptions!

Aarhus University Biostatistics - Why? What? How? 27 / 27