Professor: Howard B. Lee
Lecture Notes
Week 1: Introduction, Chapter 5

Lecture 1
Review of material from previous statistics courses :
The Measure of Central Tendency (Average) is the middle portion of a distribution.
What is the difference between ...
Which one are they referring to?
Mean and Median must be
numerical (quantitative) data.
Mode can be numerical or
nonnumerical (qualitative) data.
*The measure of central tendency that we usually refer to in class (and in statistics in general) is the arithmatic mean or simply "the mean".
The mean is subject to distortion by outlying values or "outliers".
Ex. For the values 1.,2.,3.,4.,10,000., finding the
arithmetic mean will illustrate the effect of the outlier value of 10.000.
1+2+3+4+10,000 = 10010
10010/5 = 2002.
The mean is pulled towards the outlier.
*The median is a better measure of a central point.

Where is the outlier? To the right when it is a positively skewed
distribution.
The long tail tells you where the skew is.
A tail pointing to the left is a
negatively skewed distribution.
A tail pointing to the right is a positively skewed distribution.
What is the mode in 1.,2.,3.,4.,10,000. ?
It doesn't exist! (This is the problem with using the mode).
Measure of central tendency can be manipulated to describe the
distribution of data in particularly pointed ways.
For ex. The LAUSD Union and the LAUSD Management can manipulate the
same data in a given distribution to their own advantage by using different
measures of central tendency. The union claims that the average salary of
teachers in LAUSD is lower than the national average by using the mode. The
management claims that the average salary of LAUSD teachers is higher than
the national average by using the mean.
Both sides are telling
the truth. How is this possible? They are using different measures of
central tendency.
Both sides should be using the median.
What is the average eye color ?
For ex. blue, brown, brown, green.
The average eye color is brown.
How did we compute this? By using the mode.
Who is the most popular candidate?
For ex.
| Candidates | # Votes |
|---|---|
| A | 36 |
| B | 52 |
| C | 18 |
The most popular candidate is B.
How did we compute this? By using the mode.
Frequency (categorical) data is often used in politics.
How are the following two sets of data different?
set # 1 : 1,2,3,4,5 mean = 15/5 = 3
set # 2 : 0,0,0,0,15 mean = 15/5 = 3
set # 3 : 3,3,3,3,3 mean = 15/5 = 3
Set # 1 has no mode. Set # 2 has a mode of 0. The distribution of data in the three sets is very different.
Measures of Variability How do these scores change from one
another with respect to the average value?
| Measures of Central Tendency | Associated Measure of Variability |
|---|---|
| Mean | Standard Deviation |
| Median | Semi-interquartile Range |
| Mode | Index of Dispersion |
*For this course, 90% of the time we will be looking at the average and standard deviation.
In the book entitled "Contact in the First Four Minutes" about first impressions, the author states that the first impression sets the pattern for whatever follows. It is extremely important to be accurate the first time!
Lecture 2
