Statistics - Part 3:
Advanced Statistics
This is the third in a series of three articles that address the
underlying principles to help you analyze your data without having to
be a statistician.
The first step in any data analysis strategy is to determine
what you want to know, or your purpose in analyzing the data. Ideally, you
should have determined this before collecting your data, but all too frequently
this is not the case. Many of the commonly used statistical tests can be
classified into one of three categories:
- Description
- Comparison
- Association
Analyses for Description
The purpose of descriptive statistics is to describe the data. The type
of data will determine which descriptive statistic is appropriate. Specifically,
one can only calculate a mean with interval or ratio data, whereas a mode
can be calculated with nominal, ordinal, interval or ratio data.
Analyses for Comparison
A common goal in conducting research is to determine if differences exist
between two or more groups. For example, we may be interested in determining
if people who defect to another service provider are different from those
who choose to remain. While the most common examples of this type of analysis
focus on differences of means and variances, it is important to note we
can analyze many types of differences including correlation coefficients,
proportions and percentages. The statistic used is determined by the type
of data you have.
Nominal Data: Chi-Square
The chi-square test is used to determine if a relationship between variables
exists by comparing expected and observed cell frequencies. Specifically,
a chi-square test will examine the observed frequencies in a category and
compare it to what would be expected by chance or if there were no relationship
between the variables.
Example:
A researcher is interested in examining if peoples' preferences for
winter sports are related to their preferences in automobile manufacturers.
The researcher would gather data from a number of people who prefer different
winter sports and automobile manufacturers and examine the relationship
using a chi-square table that may resemble the following:
Auto Preference |
Snow-board |
Downhill Ski |
X-Country Ski |
X |
10 |
25 |
15 |
Y |
30 |
10 |
15 |
Z |
5 |
10 |
15 |
Using the standard formula for a chi-square test, we determine that
the observed frequency is indeed different form what we would expect
if there were no relationship between winter sport and automobile preference
(approximately 15 for each automobile preference). "Keep in mind
that this only indicates a relationship exists, it does not tell us that
one factor causes another.
Interval Data: t-Test
One of the most common statistical tests to use for comparisons with
interval data is the t-test. The t-test compares the means of two groups,
and then determines whether those two means are different enough to be
statistically different.
Example:
A researcher is interested in assessing whether or not there are any
differences in on-the-job-performance of employees who receive training
on-site compared to employees who travel off-site for training. The researcher
would collect the appropriate performance data for the two groups, subject
the data to a t-test and interpret the results (the value of the t statistic
is compared to a table indicating whether or not the observed means are
statistically different). If the results indicate that there is no difference
in the performance of the two groups of employees and the data have been
collected with the appropriate amount of experimental controls, the researcher
may be inclined to conclude that the two locations for training are equivalent
and make a recommendation about consolidating the training.
Interval Data: One-Way ANOVA
While the t-test is useful for testing differences between two groups,
frequently we are interested in more than two groups. In those cases,
we often rely on the Analysis of Variance (ANOVA) To tell us if those
groups are different on some variable of interest. For example, if the
training example from above included a third group (i.e., a combination
of on- and off-site training) it would require use of the ANOVA instead
of the t-test.
Interval Data: Factorial ANOVA
Frequently we are interested in understanding the effects of varying
levels of two or more variables on a third variable. In such a case,
we are unable to use the One-way ANOVA because it is limited to comparisons
of the effects of one variable on another. Essentially, a factorial ANOVA
analyzes the impact of both the variables independently as well as jointly
to determine how they affect another variable of interest.
Example:
Continuing with our training example, another factor important in determining
job performance might be job performed. Specifically, it might be both
variables (or factors) that are important in determining job performance
with certain types of jobs responding better to on-site training and
others responding better to off-site training. In order to examine whether
or not this is true, the data should be analyzed using a factorial ANOVA.
In this case we might select the three types of training (on-site, off-site,
both) and two job levels (e.g., entry-level, mid-level) and examine their
effects on job performance both alone and in combination. These two factors
yield the following hypothetical results:
Training
Job Level |
On-Site |
Off-Site |
Both |
Entry |
Hi Sat. |
Low Sat. |
Low Sat. |
Mid |
Low Sat. |
Hi Sat. |
Low Sat. |
As the table indicates, the effectiveness of the type of training depends
on the level of the job. This interaction effect (depicted in the following
graph)means, quite simply, that the two factors "interact" to
determine the effect on the variable of interest. A common way of interpreting
interaction is to use the phrase "it depends." Specifically,
if someone were to ask about the relationship between training location
and performance, you would have to say, "it depends upon job level." Essentially,
an interaction is an indication that an observed relationship is conditional,
or depends on the values of another variable.
Copyright © 1995-2007, Pearson
Education, Inc. or its affiliates. All rights reserved.
This document may not be photocopied, reproduced, translated,
or converted to any electronic or machine readable form in whole or in
part without prior written approval. If portions of this document are
quoted in scholarly research, credit must be attributed to Pearson Education,
Inc.