METRIC TESTS (TWO-SAMPLE SITUATIONS)
Erik Kusch
erik.kusch@i-solution.de
Section for Ecoinformatics & Biodiversity
Center for Biodiversity and Dynamics in a Changing World (BIOCHANGE)
Aarhus University
Aarhus University Biostatistics - Why? What? How? 1 / 15
1 Background
2 Analyses
Mann-Whitney U Test / Wilcoxon Rank-Sum Test
Wilcoxon Signed Rank Test
3 Our Data
Choice Of Variables
Methods
Research Questions
Aarhus University Biostatistics - Why? What? How? 2 / 15
Background
Introduction
Metric tests are used to compare parameters of metric/ordinal variable values
among groups/individuals.
Prominent metric tests for two-sample situations include:
Mann-Whitney U Test
Wilcoxon Signed Rank Test
t Test (dealt with in seminar 12)
...
Some of these tests rely on the assumption of independence:
The assumption of independence is a crucial prerequisite to many
statistical procedures!
Aarhus University Biostatistics - Why? What? How? 4 / 15
Background
Independence
Theory:
Even the smallest dependence in your
data can turn into heavily biased
results (which may be undetectable).
A dependence is a connection
between/within the data.
The assumption of independence
relies on the absence of any
connection in your data that haven’t
been accounted for in your approach
(accounting for it is difficult).
Independent data:
Between Groups
Groups of data records should be
pulled from different individuals.
Within Groups
Data values within the same group are
not to influence one another.
Within Individuals
Data values recorded for one
individual should not influence each
other. This is often an issue with
repeated measurement approaches.
Fixing this after data collection is almost impossible!
Aarhus University Biostatistics - Why? What? How? 5 / 15
Analyses Mann-Whitney U Test / Wilcoxon Rank-Sum Test
Purpose And Assumptions
Mann-Whitney U Test
wilcox.test(..., paired = FALSE) in base R
Purpose:
To identify whether groups of variable values are different from
one another.
H
0
There is no difference in characteristics of the response
variable values in dependence of the classes of the predictor
variable.
Assumptions:
Predictor variable is binary
Response variable is ordinal or metric
Variable values are independent (not paired)
Aarhus University Biostatistics - Why? What? How? 7 / 15
Analyses Mann-Whitney U Test / Wilcoxon Rank-Sum Test
Minimal Working Example
Let’s use the wilcox.test(..., paired = FALSE) function to test
whether the medians of an unnamed variable of two unconnected populations
(a and b) with 10 individuals each are truly different:
set.seed(42)
a <- rnorm(n = 10, mean = 10, sd = 3)
b <- rnorm(n = 10, mean = 5, sd = 3)
wilcox.test(a, b, paired = FALSE)
##
## Wilcoxon rank sum test
##
## data: a and b
## W = 92, p-value = 7e-04
## alternative hypothesis: true location shift is not equal to 0
The medians are significantly different (p =
7.25 × 10
4
). Keep in mind that the
populations do not have to be of the same size for this!
Aarhus University Biostatistics - Why? What? How? 8 / 15
Analyses Wilcoxon Signed Rank Test
Purpose And Assumptions
Wilcoxon Signed Rank Test
wilcox.test(..., paired = TRUE) in base R
Purpose:
To identify whether groups of variable values in a repeated
sampling set-up are different from one another.
H
0
There is no difference in characteristics of the response
variable values in dependence of the classes of the predictor
variable.
Assumptions:
Predictor variable is binary
Response variable is ordinal or metric
Variable values are dependent (paired)
Aarhus University Biostatistics - Why? What? How? 9 / 15
Analyses Wilcoxon Signed Rank Test
Minimal Working Example
Let’s use the
wilcox.test(..., paired =TRUE)
function to test whether
the medians of an unnamed variable of two connected samples (a and b) with
10 individuals each (i.e. one re-sampled population) are truly different:
set.seed(42)
a <- rnorm(n = 10, mean = 10, sd = 3)
b <- rnorm(n = 10, mean = 5, sd = 3)
wilcox.test(a, b, paired = TRUE)
##
## Wilcoxon signed rank test
##
## data: a and b
## V = 52, p-value = 0.01
## alternative hypothesis: true location shift is not equal to 0
The medians are significantly different (p = 0.01 ). Keep in mind that the
samples have to be of the same size for this (i.e. there is one data record in
b that corresponds to one data record in a)!
Aarhus University Biostatistics - Why? What? How? 10 / 15
Our Data Choice Of Variables
Variables We Can Use
Response variables (metric/ordinal)
Weight
Height
Wing Chord
Nesting Height
Number of Eggs
Egg Weight
Home Range
Predictor variables (binary)
Population Status (Introduced vs.
Native)
Sex (Male vs. Female)
Nesting Site (Tree vs. Shrub)
Predator Presence (Yes vs. No)
Predator Type (Avian vs.
Non-Avian)
Climate (Continental vs. Coastal)
Aarhus University Biostatistics - Why? What? How? 12 / 15
Our Data Methods
The with() function I
The with() function can be used to make your code:
- easier to write and read
- more accessible
You might hear someone refer to it as soft attach because it works a lot like
the attach() function in R but causes none of its problems
You use with() to refer to data contained within a data object inside R:
with(data.object,
expression(reference.to.object.within.data.object)
)
Aarhus University Biostatistics - Why? What? How? 13 / 15
Our Data Methods
The with() function II
WithFrame <- data.frame(First = 1:10, Second = 11:20)
WithFrame$First
## [1] 1 2 3 4 5 6 7 8 9 10
WithFrame$Second
## [1] 11 12 13 14 15 16 17 18 19 20
Now let’s try two operations:
WithFrame$First + WithFrame$Second
## [1] 12 14 16 18 20 22 24 26 28 30
with(WithFrame, First + Second)
## [1] 12 14 16 18 20 22 24 26 28 30
The results are the same!
Aarhus University Biostatistics - Why? What? How? 14 / 15
Our Data Research Questions
Research Questions And Hypotheses
So which of our major research questions (seminar 6) can we answer?
Mann Whitney U Test
Climate Warming/Extremes: Does
sparrow morphology depend on
climate?
Predation: Does nesting height
depend on predator characteristics?
Competition: Does home range
depend on climate?
Sexual Dimorphism: Does sparrow
morphology depend on sex?
Use the 1 -
Sparrow_Data_READY.rds data set for
these analyses.
Wilcoxon Signed Rank Test (suppose a
resettling program)
Climate Warming/Extremes: Does
sparrow morphology change depend
on climate?
Predation: Does nesting height
depend on predator characteristics?
Competition: Does home range
depend on climate?
Use the 2b -
Sparrow_ResettledSIUK_READY.rds
data set for these analyses.
Aarhus University Biostatistics - Why? What? How? 15 / 15