R Code and Graphs

UA-60924200-1

> summary(RED.Wine.14MARCH)

 fixed.acidity   volatile.acidity  citric.acid    residual.sugar     chlorides     

 Min.   : 4.60   Min.   :0.1200   Min.   :0.000   Min.   : 0.900   Min.   :0.01200 

 1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090   1st Qu.: 1.900   1st Qu.:0.07000 

 Median : 7.90   Median :0.5200   Median :0.260   Median : 2.200   Median :0.07900 

 Mean   : 8.32   Mean   :0.5278   Mean   :0.271   Mean   : 2.539   Mean   :0.08747 

 3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420   3rd Qu.: 2.600   3rd Qu.:0.09000 

 Max.   :15.90   Max.   :1.5800   Max.   :1.000   Max.   :15.500   Max.   :0.61100 

 free.sulfur.dioxide total.sulfur.dioxide    density             pH      

 Min.   : 1.00       Min.   :  6.00       Min.   :0.9901   Min.   :2.740 

 1st Qu.: 7.00       1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210 

 Median :14.00       Median : 38.00       Median :0.9968   Median :3.310 

 Mean   :15.87       Mean   : 46.47       Mean   :0.9967   Mean   :3.311 

 3rd Qu.:21.00       3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400 

 Max.   :72.00       Max.   :289.00       Max.   :1.0037   Max.   :4.010 

   sulphates         alcohol         quality    

 Min.   :0.3300   Min.   : 8.40   Min.   :3.000 

 1st Qu.:0.5500   1st Qu.: 9.50   1st Qu.:5.000 

 Median :0.6200   Median :10.20   Median :6.000 

 Mean   :0.6581   Mean   :10.42   Mean   :5.636 

 3rd Qu.:0.7300   3rd Qu.:11.10   3rd Qu.:6.000 

 Max.   :2.0000   Max.   :14.90   Max.   :8.000 

>


> RED_WINE_VARS <-c("fixed.acidity","volatile.acidity","citric.acid","residual.sugar","chlorides","free.sulfur.dioxide","total.sulfur.dioxide","density","pH","sulphates","alcohol","quality")

>

>

> my_RW_stats <- function(RED.Wine.14MARCH, na.omit = FALSE) {if (na.omit)

+     x <- RED.Wine.14MARCH[!is.na(RED.Wine.14MARCH)]

+     m <- mean(RED.Wine.14MARCH)

+     med <- median(RED.Wine.14MARCH)

+     n <- length(RED.Wine.14MARCH)

+     min <- min(RED.Wine.14MARCH)

+     max <- max(RED.Wine.14MARCH)

+     s <- sd(RED.Wine.14MARCH)

+     v<- var(RED.Wine.14MARCH)

+     quant <- quantile (RED.Wine.14MARCH)

+     skew <- sum((RED.Wine.14MARCH - m)^3/s^3)/n

+     kurt <- sum((RED.Wine.14MARCH - m)^4/s^4)/n - 3

+     return(c( n = n, Min = min, Max = max, Mean= m, Median = med, Stddev = s, Var =v, Quantile = quant, Skew = skew, Kurtosis = kurt))

+ }

> sapply(RED.Wine.14MARCH[RED_WINE_VARS],my_RW_stats)

              fixed.acidity volatile.acidity   citric.acid residual.sugar    chlorides

n              1599.0000000     1.599000e+03 1599.00000000    1599.000000 1.599000e+03

Min               4.6000000     1.200000e-01    0.00000000       0.900000 1.200000e-02

Max              15.9000000     1.580000e+00    1.00000000      15.500000 6.110000e-01

Mean              8.3196373     5.278205e-01    0.27097561       2.538806 8.746654e-02

Median            7.9000000     5.200000e-01    0.26000000       2.200000 7.900000e-02

Stddev            1.7410963     1.790597e-01    0.19480114       1.409928 4.706530e-02

Var               3.0314164     3.206238e-02    0.03794748       1.987897 2.215143e-03

Quantile.0%       4.6000000     1.200000e-01    0.00000000       0.900000 1.200000e-02

Quantile.25%      7.1000000     3.900000e-01    0.09000000       1.900000 7.000000e-02

Quantile.50%      7.9000000     5.200000e-01    0.26000000       2.200000 7.900000e-02

Quantile.75%      9.2000000     6.400000e-01    0.42000000       2.600000 9.000000e-02

Quantile.100%    15.9000000     1.580000e+00    1.00000000      15.500000 6.110000e-01

Skew              0.9809084     6.703331e-01    0.31774029       4.532140 5.669694e+00

Kurtosis          1.1196987     1.212689e+00   -0.79304553      28.485020 4.152596e+01

              free.sulfur.dioxide total.sulfur.dioxide      density           pH

n                     1599.000000          1599.000000 1.599000e+03 1.599000e+03

Min                      1.000000             6.000000 9.900700e-01 2.740000e+00

Max                     72.000000           289.000000 1.003690e+00 4.010000e+00

Mean                    15.874922            46.467792 9.967467e-01 3.311113e+00

Median                  14.000000            38.000000 9.967500e-01 3.310000e+00

Stddev                  10.460157            32.895324 1.887334e-03 1.543865e-01

Var                    109.414884          1082.102373 3.562029e-06 2.383518e-02

Quantile.0%              1.000000             6.000000 9.900700e-01 2.740000e+00

Quantile.25%             7.000000            22.000000 9.956000e-01 3.210000e+00

Quantile.50%            14.000000            38.000000 9.967500e-01 3.310000e+00

Quantile.75%            21.000000            62.000000 9.978350e-01 3.400000e+00

Quantile.100%           72.000000           289.000000 1.003690e+00 4.010000e+00

Skew                     1.248222             1.512689 7.115397e-02 1.933203e-01

Kurtosis                 2.007221             3.785676 9.225000e-01 7.959191e-01

                 sulphates      alcohol      quality

n             1.599000e+03 1599.0000000 1599.0000000

Min           3.300000e-01    8.4000000    3.0000000

Max           2.000000e+00   14.9000000    8.0000000

Mean          6.581488e-01   10.4229831    5.6360225

Median        6.200000e-01   10.2000000    6.0000000

Stddev        1.695070e-01    1.0656676    0.8075694

Var           2.873262e-02    1.1356474    0.6521684

Quantile.0%   3.300000e-01    8.4000000    3.0000000

Quantile.25%  5.500000e-01    9.5000000    5.0000000

Quantile.50%  6.200000e-01   10.2000000    6.0000000

Quantile.75%  7.300000e-01   11.1000000    6.0000000

Quantile.100% 2.000000e+00   14.9000000    8.0000000

Skew          2.424118e+00    0.8592144    0.2173931

Kurtosis      1.166153e+01    0.1916586    0.2879148

>

>

 By using describe() , which is part of the Hmisc [Harrel Miscellaneous] package Summary Stats are as seen below

The describe() function provides – n = Sample Size , missing = Missing values if any , unique = Unique values ,

Info = ?? , Mean = Variable Mean , Quantiles .05 to .95 [ Further info - http://www.inside-r.org/packages/cran/Hmisc/docs/describe

> library("Hmisc", lib.loc="~/R/win-library/3.1")

Loading required package: grid

Loading required package: lattice

Loading required package: survival

Loading required package: Formula

Loading required package: ggplot2

 

Attaching package: ‘Hmisc’

 

The following objects are masked from ‘package:base’:

 

    format.pval, round.POSIXt, trunc.POSIXt, units

 

> describe(RED.Wine.14MARCH)

RED.Wine.14MARCH

 

 12  Variables      1599  Observations

---------------------------------------------------------------------------------------

fixed.acidity

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      96       1    8.32     6.1     6.5     7.1     7.9     9.2

    .90     .95

   10.7    11.8

 

lowest :  4.6  4.7  4.9  5.0  5.1, highest: 14.3 15.0 15.5 15.6 15.9

---------------------------------------------------------------------------------------

volatile.acidity

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0     143       1  0.5278   0.270   0.310   0.390   0.520   0.640

    .90     .95

  0.745   0.840

 

lowest : 0.120 0.160 0.180 0.190 0.200, highest: 1.180 1.185 1.240 1.330 1.580

---------------------------------------------------------------------------------------

citric.acid

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      80       1   0.271   0.000   0.010   0.090   0.260   0.420

    .90     .95

  0.522   0.600

 

lowest : 0.00 0.01 0.02 0.03 0.04, highest: 0.75 0.76 0.78 0.79 1.00

---------------------------------------------------------------------------------------

residual.sugar

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      91       1   2.539    1.59    1.70    1.90    2.20    2.60

    .90     .95

   3.60    5.10

 

lowest :  0.9  1.2  1.3  1.4  1.5, highest: 13.4 13.8 13.9 15.4 15.5

---------------------------------------------------------------------------------------

chlorides

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0     153       1 0.08747  0.0540  0.0600  0.0700  0.0790  0.0900

    .90     .95

 0.1090  0.1261

 

lowest : 0.012 0.034 0.038 0.039 0.041, highest: 0.422 0.464 0.467 0.610 0.611

---------------------------------------------------------------------------------------

free.sulfur.dioxide

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      60       1   15.87       4       5       7      14      21

    .90     .95

     31      35

 

lowest :  1  2  3  4  5, highest: 55 57 66 68 72

---------------------------------------------------------------------------------------

total.sulfur.dioxide

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0     144       1   46.47    11.0    14.0    22.0    38.0    62.0

    .90     .95

   93.2   112.1

 

lowest :   6   7   8   9  10, highest: 155 160 165 278 289

---------------------------------------------------------------------------------------

density

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0     436       1  0.9967  0.9936  0.9946  0.9956  0.9968  0.9978

    .90     .95

 0.9991  1.0000

 

lowest : 0.9901 0.9902 0.9906 0.9908 0.9908

highest: 1.0026 1.0029 1.0031 1.0032 1.0037

---------------------------------------------------------------------------------------

pH

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      89       1   3.311    3.06    3.12    3.21    3.31    3.40

    .90     .95

   3.51    3.57

 

lowest : 2.74 2.86 2.87 2.88 2.89, highest: 3.75 3.78 3.85 3.90 4.01

---------------------------------------------------------------------------------------

sulphates

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      96       1  0.6581    0.47    0.50    0.55    0.62    0.73

    .90     .95

   0.85    0.93

 

lowest : 0.33 0.37 0.39 0.40 0.42, highest: 1.61 1.62 1.95 1.98 2.00

---------------------------------------------------------------------------------------

alcohol

      n missing  unique    Info    Mean     .05     .10     .25     .50     .75

   1599       0      65       1   10.42     9.2     9.3     9.5    10.2    11.1

    .90     .95

   12.0    12.5

 

lowest :  8.40  8.50  8.70  8.80  9.00, highest: 13.50 13.57 13.60 14.00 14.90

---------------------------------------------------------------------------------------

quality

      n missing  unique    Info    Mean

   1599       0       6    0.86   5.636

 

           3  4   5   6   7  8

Frequency 10 53 681 638 199 18

%          1  3  43  40  12  1

---------------------------------------------------------------------------------------

>