Statistics - Part 2:
Descriptive Statistics
This is the the second in a series of three articles that address
the underlying principles to help you analyze your data without having
to be a statistician.
The first step in any data analysis strategy is to calculate
summary measures to get a general feel for the data. Summary measures for
a data set are often referred to as descriptive statistics. Descriptive
statistics fall into three main categories:
- Measures
of position (or central tendency)
- Measures
of variability
- Measures
of skewness
They can be useful for beginning data analysis, for comparing multiple
data sets, and for reporting final results of a survey.
Measures of Position
Measures of position (or central tendency) describe where the data are
concentrated.
Mean:
The Mean is simply the mathematical average of the data. The mean provides
you with a quick way of describing your data, and is probably the most
used measure of central tendency. However, the mean is greatly influenced
by outliers. For example, consider the following set: 1 1 2 4 5 5 6
6 7 150
While the mean for this data set is 18.7, it is obvious that nine out of
ten of the observation lie below the mean because of the large final observation.
Consequently, the mean is not always the best measure of central tendency.
Median:
The median is the middle observation in a data set. That is, 50% of
the observation are above the median and 50% are below the median (for
sets with an even number of observation, the median is the average of
the middle two observation). The median is often used when a data set
is not symmetrical, or when there are outlying observation. For example,
median income is generally reported rather than mean income because
of the outlying observation.
Mode:
The Mode is the value around which the greatest number of observation
are concentrated, or quite simply the most common observation. Mode
is often used with nominal data, but is not the preferred measure
for other types of data.
Measures of Variability
While measures of position describe where the data points are concentrated,
measures of variability measure the dispersion (or spread) of the data
set.
Range:
The range is the difference between the largest and the smallest observations
in the data set. However, This is a limited measure because it depends
on only two of the numbers in the data set. Using the above data
set again, the range is 149, but that does not provide any information
regarding the concentration of the data at the low end of the scale.
Another limitation of range is that it is affected by the number
of observations in the data set. Generally, the more observation
there are, the more spread out they will be. One use of range in
everyday life is in newspaper stock market summaries, which give
the day's high and low numbers.
Variance:
Unlike range, variance takes into consideration all the data points
in the data set. If all the observation are the same, the variance
would be zero. The more spread out the observation are, the larger
the variance.
Standard Deviation:
Standard deviation is the positive square root of the variance, and
is the most common measure of variability. Standard deviation indicates
how close to the mean the observations are. The larger the standard
deviation, the more variation there is in the data set.
Measures of Skewness
Measures of position and variability tell us where the data are located
and how dispersed they are. Measures of skewness are concerned with whether
the data are symmetrically distributed, or the shape of the distribution.
Most people are familiar with the distribution referred to as the normal,
or bell-shaped, curve. Many of the statistics we use assume the data
are distributed normally. Unfortunately, this is not always the c
Copyright © 1995-2007, Pearson
Education, Inc. or its affiliates. All rights reserved.
This document may not be photocopied, reproduced, translated,
or converted to any electronic or machine readable form in whole or in
part without prior written approval. If portions of this document are
quoted in scholarly research, credit must be attributed to Pearson Education,
Inc.