UA-60924200-1 # Creating Corrplots with the WCD Data with Outliers , will then create Corrplots with WCD with Capped Outliers - checking difference and reporting Correlation to Client for the Capped Outliers data set .
> library(corrplot) > M <- cor(WCD) > corrplot(M, method = "color",type = "upper") > library(corrplot) > M <- cor(WCD) > corrplot(M, method = "number",type = "upper") > cor(Delicassen,Fresh) [1] 0.24469 > cor(Delicassen,Milk) [1] 0.4063683 > cor(Delicassen,Grocery) [1] 0.2054965 > xx<-data.frame(WCD$Fresh,WCD$Milk,WCD$Grocery,WCD$Frozen,WCD$Detergents_Paper,WCD$Delicassen) > View(xx) > M <- cor(xx) > corrplot(M, method = "color",type = "upper") > M <- cor(xx) > corrplot(M, method = "number",type = "upper") > ## - Created a Data.Frame (xx) to Exclude
Channel and Region... > cor(Delicassen,Grocery) [1] 0.2054965 > cor(Delicassen,Milk) [1] 0.4063683 > cor(Delicassen,Fresh) [1] 0.24469 > xx<-data.frame(WCD$Fresh,WCD$Milk,WCD$Grocery,WCD$Frozen,WCD$Detergents_Paper,WCD$Delicassen) > M <- cor(xx) > corrplot(M, method = "color",type = "upper") > M <- cor(xx) > corrplot(M, method = "number",type = "upper") >
> yy<-data.frame(I_W$WCD.Fresh,I_W$WCD.Milk,I_W$WCD.Grocery,I_W$WCD.Frozen,I_W$WCD.Detergents_Paper,I_W$WCD.Delicassen) > > N <- cor(yy) > corrplot(N, method = "color",type = "upper") > corrplot(N, method = "number",type = "upper")
# As seen above - # - For Data.Frame (WCD) - ### > cor(Delicassen,Milk) [1] 0.4063683 > cor(Delicassen,Fresh) [1] 0.24469 > cor(Delicassen,Grocery) [1] 0.2054965 # - For Data.Frame (WDC) - ### High Correlation also as seen in the Corrplot above > cor(Detergents_Paper,Grocery) [1] 0.9246407 > ## - We now measure the same variables for Correlation after having Imputed the OUTLIERS .
As seen below with ggplot - we have visually shown the high correlation between - Detergents_Paper and Grocery
# - For Data.Frame (I_W) which is the DataSet with Capped / Imputed Outliers - ###
> cor(I_W$WCD.Delicassen,I_W$WCD.Milk) [1] 0.2176773 > cor(I_W$WCD.Delicassen,I_W$WCD.Fresh) [1] 0.1177586 > cor(I_W$WCD.Delicassen,I_W$WCD.Grocery) [1] 0.1295903
# - For Data.Frame (I_W) - ### High Correlation also as seen in the Corrplot above > (I_W$WCD.Detergents_Paper,I_W$WCD.Grocery) [1] 0.6856996
Thus if the Wholesaler wants to increase the sales of Groceries in a region - they should also focus on - Detergents_Paper and vice versa as these are showing High Correlation , in both , the Original WCD Data and the Imputed WCD / I_W data .
|