In this article, I check the correlation between sales and the number of visiting customers.
※If you have a data, you don't have to read contents1
Contents
1. Creation of simulation data
2. Visualization
3. Correlation
1. Creation of simulation data
I use the data used here. Moreover, I add data to examine the relationship between two variables. Of course, You can check the relationship between multiple variables with almost the same code)
> head(Data)
time sales
1 2020-03-01 7
2 2020-03-02 4
3 2020-03-03 17
4 2020-03-04 2
5 2020-03-05 9
6 2020-03-06 9
> Data$number.of.customers <- rnbinom(nrow(Data), mu = 7, 0.8)
> head(Data)
time sales number.of.customers
1 2020-03-01 7 1
2 2020-03-02 4 18
3 2020-03-03 17 2
4 2020-03-04 2 1
5 2020-03-05 9 2
6 2020-03-06 9 43
2. Visualization
> library(ggplot2)
> Data$time <- as.POSIXct(Data$time)
> ggplot(data=Data, aes(x=time))+
+ geom_line(aes(y=scale(sales), colour="black"), size=0.9, show.legend = T)+
+ geom_line(aes(y=scale(number.of.customers), colour="blue"), size=0.9, show.legend = T)+
+ labs(title="Comparison")+
+ ylab("sals/number.of.customers")+
+ scale_x_datetime(date_labels="%m/%d")+
+ scale_colour_manual(name='Legend',guide='legend' ,
+ values = c("black"="black", "blue"= "blue"),
+ labels=c('sales', 'number.of.customers'))
3. Correlation
> library(corrplot)
> library(gplots)
> corrplot.mixed(corr=cor(Data[,c(2,3)]), upper="ellipse")