METRIC TESTS (TWO-SAMPLE SITUATIONS)

Erik Kusch

erik.kusch@i-solution.de

Section for Ecoinformatics & Biodiversity

Center for Biodiversity and Dynamics in a Changing World (BIOCHANGE)

Aarhus University

Aarhus University Biostatistics - Why? What? How? 1 / 15

1 Background

2 Analyses

Mann-Whitney U Test / Wilcoxon Rank-Sum Test

Wilcoxon Signed Rank Test

3 Our Data

Choice Of Variables

Methods

Research Questions

Aarhus University Biostatistics - Why? What? How? 2 / 15

Background

Introduction

Metric tests are used to compare parameters of metric/ordinal variable values

among groups/individuals.

Prominent metric tests for two-sample situations include:

Mann-Whitney U Test

Wilcoxon Signed Rank Test

t Test (dealt with in seminar 12)

...

Some of these tests rely on the assumption of independence:

The assumption of independence is a crucial prerequisite to many

statistical procedures!

Aarhus University Biostatistics - Why? What? How? 4 / 15

Background

Independence

Theory:

Even the smallest dependence in your

data can turn into heavily biased

results (which may be undetectable).

A dependence is a connection

between/within the data.

The assumption of independence

relies on the absence of any

connection in your data that haven’t

been accounted for in your approach

(accounting for it is difﬁcult).

Independent data:

Between Groups

Groups of data records should be

pulled from different individuals.

Within Groups

Data values within the same group are

not to inﬂuence one another.

Within Individuals

Data values recorded for one

individual should not inﬂuence each

other. This is often an issue with

repeated measurement approaches.

→ Fixing this after data collection is almost impossible!

Aarhus University Biostatistics - Why? What? How? 5 / 15

Analyses Mann-Whitney U Test / Wilcoxon Rank-Sum Test

Purpose And Assumptions

Mann-Whitney U Test

wilcox.test(..., paired = FALSE) in base R

Purpose:

To identify whether groups of variable values are different from

one another.

There is no difference in characteristics of the response

variable values in dependence of the classes of the predictor

variable.

Assumptions:

Predictor variable is binary

Response variable is ordinal or metric

Variable values are independent (not paired)

Aarhus University Biostatistics - Why? What? How? 7 / 15

Analyses Mann-Whitney U Test / Wilcoxon Rank-Sum Test

Minimal Working Example

Let’s use the wilcox.test(..., paired = FALSE) function to test

whether the medians of an unnamed variable of two unconnected populations

(a and b) with 10 individuals each are truly different:

set.seed(42)

a <- rnorm(n = 10, mean = 10, sd = 3)

b <- rnorm(n = 10, mean = 5, sd = 3)

wilcox.test(a, b, paired = FALSE)

## Wilcoxon rank sum test

## data: a and b

## W = 92, p-value = 7e-04

## alternative hypothesis: true location shift is not equal to 0

The medians are signiﬁcantly different (p =

7.25 × 10

−4

). Keep in mind that the

populations do not have to be of the same size for this!

Aarhus University Biostatistics - Why? What? How? 8 / 15

Analyses Wilcoxon Signed Rank Test

Purpose And Assumptions

Wilcoxon Signed Rank Test

wilcox.test(..., paired = TRUE) in base R

Purpose:

To identify whether groups of variable values in a repeated

sampling set-up are different from one another.

There is no difference in characteristics of the response

variable values in dependence of the classes of the predictor

variable.

Assumptions:

Predictor variable is binary

Response variable is ordinal or metric

Variable values are dependent (paired)

Aarhus University Biostatistics - Why? What? How? 9 / 15

Analyses Wilcoxon Signed Rank Test

Minimal Working Example

Let’s use the

wilcox.test(..., paired =TRUE)

function to test whether

the medians of an unnamed variable of two connected samples (a and b) with

10 individuals each (i.e. one re-sampled population) are truly different:

set.seed(42)

a <- rnorm(n = 10, mean = 10, sd = 3)

b <- rnorm(n = 10, mean = 5, sd = 3)

wilcox.test(a, b, paired = TRUE)

## Wilcoxon signed rank test

## data: a and b

## V = 52, p-value = 0.01

## alternative hypothesis: true location shift is not equal to 0

The medians are signiﬁcantly different (p = 0.01 ). Keep in mind that the

samples have to be of the same size for this (i.e. there is one data record in

b that corresponds to one data record in a)!

Aarhus University Biostatistics - Why? What? How? 10 / 15

Our Data Choice Of Variables

Variables We Can Use

Response variables (metric/ordinal)

Weight

Height

Wing Chord

Nesting Height

Number of Eggs

Egg Weight

Home Range

Predictor variables (binary)

Population Status (Introduced vs.

Native)

Sex (Male vs. Female)

Nesting Site (Tree vs. Shrub)

Predator Presence (Yes vs. No)

Predator Type (Avian vs.

Non-Avian)

Climate (Continental vs. Coastal)

Aarhus University Biostatistics - Why? What? How? 12 / 15

Our Data Methods

The with() function I

The with() function can be used to make your code:

- easier to write and read

- more accessible

You might hear someone refer to it as soft attach because it works a lot like

the attach() function in R but causes none of its problems

You use with() to refer to data contained within a data object inside R:

with(data.object,

expression(reference.to.object.within.data.object)

)

Aarhus University Biostatistics - Why? What? How? 13 / 15

Our Data Methods

The with() function II

WithFrame <- data.frame(First = 1:10, Second = 11:20)

WithFrame$First

## [1] 1 2 3 4 5 6 7 8 9 10

WithFrame$Second

## [1] 11 12 13 14 15 16 17 18 19 20

Now let’s try two operations:

WithFrame$First + WithFrame$Second

## [1] 12 14 16 18 20 22 24 26 28 30

with(WithFrame, First + Second)

## [1] 12 14 16 18 20 22 24 26 28 30

The results are the same!

Aarhus University Biostatistics - Why? What? How? 14 / 15

Our Data Research Questions

Research Questions And Hypotheses

So which of our major research questions (seminar 6) can we answer?

Mann Whitney U Test

Climate Warming/Extremes: Does

sparrow morphology depend on

climate?

Predation: Does nesting height

depend on predator characteristics?

Competition: Does home range

depend on climate?

Sexual Dimorphism: Does sparrow

morphology depend on sex?

Use the 1 -

Sparrow_Data_READY.rds data set for

these analyses.

Wilcoxon Signed Rank Test (suppose a

resettling program)

Climate Warming/Extremes: Does

sparrow morphology change depend

on climate?

Predation: Does nesting height

depend on predator characteristics?

Competition: Does home range

depend on climate?

Use the 2b -

Sparrow_ResettledSIUK_READY.rds

data set for these analyses.

Aarhus University Biostatistics - Why? What? How? 15 / 15