Calculate Tukey HSD by RStudio
I want to perform Tukey HSD and other tests using RStudio. In this article, I will describe how to use R for Tukey HSD and other tests as a memo.
Install packages
In this article, we will use multcomp
and car
for testing and tidyverse
for graph creation. To use them as libraries, first download the packages to RStudio with the following code:
install.packages("multcomp")
install.packages("car")
install.packages("tidyverse")
In my environment, I encountered an error downloading tidyverse
because the R version of RStudio was old. If tidyverse
cannot be downloaded properly, consider updating R.
TukeyHSD
Before performing Tukey HSD, we performed Shapiro's test and Levene's test to confirm whether the data to be used for testing follows a normal distribution and has homoscedasticity, respectively.
Input data format
The data used is in the form of a csv file as shown in the following table. Column names can be anything.
color | values |
---|---|
red | 0.237 |
red | 0.230 |
red | 0.156 |
blue | 0.990 |
blue | 0.967 |
blue | 1.146 |
blue | 1.2005 |
yellow | 0.15093 |
yellow | 0.0866 |
yellow | 0.1337 |
green | 0.686 |
green | 0.557 |
green | 0.486 |
green | 0.5866 |
Load libraries
library(multcomp)
library(car)
library(tidyverse)
Read the csv file
Read the csv file into R. Also, change the column names of the csv data.
file_path <- "C:/Users/hoge/R/your_data.csv"
data <- read.csv(file_path, header = TRUE, col.names = c("treatment", "values"))
Type conversion
Convert the type of the treatment column to a factor type so that it can be used for testing.
data$treatment <- factor(data$treatment)
Attach to make it easier to specify items
attach(data)
Testing
Normality test
tapply(values, treatment, shapiro.test)
Output
$blue
Shapiro-Wilk normality test
data: X[[i]]
W = 0.87592, p-value = 0.3215
$green
Shapiro-Wilk normality test
data: X[[i]]
W = 0.98214, p-value = 0.9144
$red
Shapiro-Wilk normality test
data: X[[i]]
W = 0.81429, p-value = 0.1491
$yellow
Shapiro-Wilk normality test
data: X[[i]]
W = 0.93295, p-value = 0.4998
Homoscedasticity test
leveneTest(values, treatment, center = mean)
Output
Levene's Test for Homogeneity of Variance (center = mean)
Df F value Pr(>F)
group 3 3.7002 0.05027 .
10
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ANOVA
anova(lm(values ~ treatment))
Output
Analysis of Variance Table
Response: values
Df Sum Sq Mean Sq F value Pr(>F)
treatment 3 2.00564 0.66855 100.44 9.133e-08 ***
Residuals 10 0.06656 0.00666
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Tukey HSD
TukeyHSD(aov(values ~ treatment))
Output
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = values ~ treatment)
$treatment
diff lwr upr p adj
green-blue -0.49697500 -0.6734702 -0.3204798 0.0000304
red-blue -0.86820833 -1.0588449 -0.6775717 0.0000004
yellow-blue -0.95213167 -1.1427683 -0.7614951 0.0000002
red-green -0.37123333 -0.5618699 -0.1805967 0.0006723
yellow-green -0.45515667 -0.6457933 -0.2645201 0.0001270
yellow-red -0.08392333 -0.2877224 0.1198758 0.6061560
Create alphabet from significant difference
Adjust the significance level with level
(in this case, 5%). decreasing = T
labels the alphabet so that the process with the highest value is a.
result <- aov(values ~ treatment, data = data)
tukey <- glht(result, linfct = mcp(treatment = "Tukey"))
cld(tukey, level = 0.05, decreasing = T)
Output
blue green red yellow
"c" "b" "a" "a"
Creating a bar graph
Let's also create a bar graph.
Store the alphabet created based on Tukey's multiple comparison results in a table data
cld_result <- cld(tukey, decreasing = F)
cld_table <- as.data.frame(cld_result[["mcletters"]][["Letters"]])
colnames(cld_table) <- "alphabet"
cld_letters <- as_tibble(rownames_to_column(cld_table, "treatment"))
Calculate the mean and standard deviation and join with the alphabet table
data <- data %>%
group_by(treatment) %>%
summarise(mean = mean(values), sd = sd(values)) %>%
full_join(cld_letters, by = "treatment")
Create a bar graph
ggplot(data, aes(x = treatment, y = mean)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2) +
geom_text(aes(label = alphabet), vjust = -1, hjust = -1, size = 6) +
labs(x = "Treatment", y = "Value", title = "Show Significant Difference")
Output
All code
library(multcomp)
library(car)
library(tidyverse)
# Specify file path
file_path <- "C:/Users/hoge/R/your_data.csv"
# Read CSV file
data <- read.csv(file_path, header = TRUE, col.names = c("treatment", "values"))
data$treatment <- factor(data$treatment)
attach(data)
# Test for normality
tapply(values, treatment, shapiro.test)
# Test for homogeneity of variance
leveneTest(values, treatment, center = mean)
# Perform ANOVA analysis
anova(lm(values ~ treatment))
# Perform Tukey HSD test
TukeyHSD(aov(values ~ treatment))
# Create alphabet based on Tukey HSD results
result <- aov(values ~ treatment, data = data)
tukey <- glht(result, linfct = mcp(treatment = "Tukey"))
cld_result <- cld(tukey, level=0.05,decreasing = T)
cld_table <- as.data.frame(cld_result[["mcletters"]][["Letters"]])
colnames(cld_table) <- "alphabet"
cld_letters <- as_tibble(rownames_to_column(cld_table, "treatment"))
# Combine alphabet table with mean and standard deviation
data <- data %>%
group_by(treatment) %>%
summarise(mean = mean(values), sd = sd(values)) %>%
full_join(cld_letters, by = "treatment")
# Create bar graph
ggplot(data, aes(x = treatment, y = mean)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2) +
geom_text(aes(label = alphabet), vjust = -1, hjust = -1, size = 6) +
labs(x = "Treatment", y = "Value", title = "Show Significant Difference")