Performing Tukey HSD and Creating Bar Graphs in RStudio

Last updated at 2025-06-04Posted at 2023-03-13

Calculate Tukey HSD by RStudio

I want to perform Tukey HSD and other tests using RStudio. In this article, I will describe how to use R for Tukey HSD and other tests as a memo.

Install packages

In this article, we will use multcomp and car for testing and tidyverse for graph creation. To use them as libraries, first download the packages to RStudio with the following code:

install.packages("multcomp")
install.packages("car")
install.packages("tidyverse")

In my environment, I encountered an error downloading tidyverse because the R version of RStudio was old. If tidyverse cannot be downloaded properly, consider updating R.

TukeyHSD

Before performing Tukey HSD, we performed Shapiro's test and Levene's test to confirm whether the data to be used for testing follows a normal distribution and has homoscedasticity, respectively.

Input data format

The data used is in the form of a csv file as shown in the following table. Column names can be anything.

color	values
red	0.237
red	0.230
red	0.156
blue	0.990
blue	0.967
blue	1.146
blue	1.2005
yellow	0.15093
yellow	0.0866
yellow	0.1337
green	0.686
green	0.557
green	0.486
green	0.5866

Load libraries

library(multcomp)
library(car) 
library(tidyverse)

Read the csv file

Read the csv file into R. Also, change the column names of the csv data.

file_path <- "C:/Users/hoge/R/your_data.csv"
data <- read.csv(file_path, header = TRUE, col.names = c("treatment", "values"))

Type conversion

Convert the type of the treatment column to a factor type so that it can be used for testing.

data$treatment <- factor(data$treatment)

Attach to make it easier to specify items

attach(data)

Testing

Normality test

tapply(values, treatment, shapiro.test)

Output

$blue

	Shapiro-Wilk normality test

data:  X[[i]]
W = 0.87592, p-value = 0.3215


$green

	Shapiro-Wilk normality test

data:  X[[i]]
W = 0.98214, p-value = 0.9144


$red

	Shapiro-Wilk normality test

data:  X[[i]]
W = 0.81429, p-value = 0.1491


$yellow

	Shapiro-Wilk normality test

data:  X[[i]]
W = 0.93295, p-value = 0.4998

Homoscedasticity test

leveneTest(values, treatment, center = mean)

Output

Levene's Test for Homogeneity of Variance (center = mean)
      Df F value  Pr(>F)  
group  3  3.7002 0.05027 .
      10                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

ANOVA

anova(lm(values ~ treatment))

Output

Analysis of Variance Table

Response: values
          Df  Sum Sq Mean Sq F value    Pr(>F)    
treatment  3 2.00564 0.66855  100.44 9.133e-08 ***
Residuals 10 0.06656 0.00666                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Tukey HSD

TukeyHSD(aov(values ~ treatment))

Output

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = values ~ treatment)

$treatment
                    diff        lwr        upr     p adj
green-blue   -0.49697500 -0.6734702 -0.3204798 0.0000304
red-blue     -0.86820833 -1.0588449 -0.6775717 0.0000004
yellow-blue  -0.95213167 -1.1427683 -0.7614951 0.0000002
red-green    -0.37123333 -0.5618699 -0.1805967 0.0006723
yellow-green -0.45515667 -0.6457933 -0.2645201 0.0001270
yellow-red   -0.08392333 -0.2877224  0.1198758 0.6061560

Create alphabet from significant difference

Adjust the significance level with level (in this case, 5%). decreasing = T labels the alphabet so that the process with the highest value is a.

result <- aov(values ~ treatment, data = data)
tukey <- glht(result, linfct = mcp(treatment = "Tukey"))
cld(tukey, level = 0.05, decreasing = T)

Output

  blue  green    red yellow 
   "c"    "b"    "a"    "a"

Creating a bar graph

Let's also create a bar graph.

Store the alphabet created based on Tukey's multiple comparison results in a table data

cld_result <- cld(tukey, decreasing = F)
cld_table <- as.data.frame(cld_result[["mcletters"]][["Letters"]])
colnames(cld_table) <- "alphabet"
cld_letters <- as_tibble(rownames_to_column(cld_table, "treatment"))

Calculate the mean and standard deviation and join with the alphabet table

data <- data %>% 
  group_by(treatment) %>% 
  summarise(mean = mean(values), sd = sd(values)) %>% 
  full_join(cld_letters, by = "treatment")

Create a bar graph

ggplot(data, aes(x = treatment, y = mean)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2) +
  geom_text(aes(label = alphabet), vjust = -1, hjust = -1, size = 6) +
  labs(x = "Treatment", y = "Value", title = "Show Significant Difference")

Output

All code

library(multcomp)
library(car) 
library(tidyverse)
# Specify file path
file_path <- "C:/Users/hoge/R/your_data.csv"

# Read CSV file
data <- read.csv(file_path, header = TRUE, col.names = c("treatment", "values"))
data$treatment <- factor(data$treatment)
attach(data)

# Test for normality
tapply(values, treatment, shapiro.test)
# Test for homogeneity of variance
leveneTest(values, treatment, center = mean)
# Perform ANOVA analysis
anova(lm(values ~ treatment))
# Perform Tukey HSD test
TukeyHSD(aov(values ~ treatment))

# Create alphabet based on Tukey HSD results
result <- aov(values ~ treatment, data = data)
tukey <- glht(result, linfct = mcp(treatment = "Tukey"))
cld_result <- cld(tukey, level=0.05,decreasing = T)
cld_table <- as.data.frame(cld_result[["mcletters"]][["Letters"]])
colnames(cld_table) <- "alphabet"
cld_letters <- as_tibble(rownames_to_column(cld_table, "treatment"))

# Combine alphabet table with mean and standard deviation
data <- data %>% 
  group_by(treatment) %>% 
  summarise(mean = mean(values), sd = sd(values)) %>% 
  full_join(cld_letters, by = "treatment")

# Create bar graph
ggplot(data, aes(x = treatment, y = mean)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2) +
  geom_text(aes(label = alphabet), vjust = -1, hjust = -1, size = 6) +
  labs(x = "Treatment", y = "Value", title = "Show Significant Difference")

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up