Lesson 1: The Course, Central Tendency
1.Studying statistics
2.Typical requirement in grad school for education
�Also studied as undergrad subject in many disciplines, such as psychology, nursing
�Growing presence in high school and sometimes, elementary school
3.Disliked and feared subject
�I am surprised at what students don�t know after finishing a stat course
�Often forgotten after grad studies end
�Often not used much after the course ends
4.One doctoral student reported this course�s info on Deming, Lesson 5, was the most helpful, useful thing learned
5.Stat is usually taught as an analytic tool but
�Many researchers in social and people subjects are using qualitative methods more and more
�Interviews and observations may be as good or better than statistics
6.Statistics can be a delight
�As useful as literacy
�Created by many non-mathematicians
�Describes all phenomena in the world
�Useful in understanding life
7.The term "Statistics" meaning numerical facts about the state seems to have first appeared in German about 1770
�In this course, we will use the term as the name of our subject
�Also for a figure calculated from a sample as opposed to a "parameter"
�The name "stochastics" is growing in popularity.
8.I get questions sometimes about this course
�Is it both descriptive and inferential? Yes!
�Nearly all statistics courses are, these days
9.Structure of course
10.Fifteen tapes of about 60 minutes each
�Introductory topics #1-7
�Three basic stat techniques #8-13
�Lesson on other areas of stat #14
�Review #15
1.Measures of central tendency
2.Measures of variation
3.Charts and graphs
4.Basic concepts of probability
5.Concepts of process control
6.Studying a single variable
7.Using a computer in statistics
8.Contingency tables I
9.Contingency tables II
10.Analysis of variance - ANOVA - I
11.ANOVA II
12.Correlation & regression
13.Correlation & regression II
14.Other topics: non-parametric, Bayesian, multivariate
15.Review
12.Typical lesson contents
�Intro
�Main information
�Application and difficulties
�May recommend exercises
�Summary
13.It often helps your learning if you collect meaningful data on a phenomenon you know well
14.A sample set of data: 13, 11, 5, 11, 9, 15, 7, 9
�Data comes in some order
�The time or temporal order is often indicative
�The time order is often lost or ignored by social sciences
�Deming's length of a table
15.Central tendency
�First step in usual data analysis
�"Descriptive" & "inferential"
�First describe, then infer
16.Problem: how to capture the set
�Could memorize
�Could sample
�Calculate "center" of numerical data
�Then measure variation from center
17.Notation for a mean (the arithmetic mean)
�Sum of observations divided by number of
�Well known concept, often covered in about grade 4 or 5
�Can be written in symbols

18.Central tendency = a kind of zero point or target
"What the process is trying to achieve"
19.Meaning of the mean
�The fair share = the result of EQUAL contributions to the total. [Question: How equal were the actual contributions?]
�The result of division: the mass DIVIDED equally among the contributors
20.Note: the mass may be small so the mean can be very small
21.For example, our town could have an average of .25 auto accidents per day
22.Most of the time, in education, we are counting, not measuring. Thus, we usually deal in integers or simple fractions
23.The mean is
�The most sensitive main measure
�The best for mathematicial theory
�Therefore the oldest, most used measure
24.Mean is closest of the main measures to all scores using one definition of "close"
�Unusual definition of "closeness"
�Definition is distance squared
25.Means are nearly always abstract concepts.
26.If I drive 50 miles in an hour, my average was 50 mph. But I may have driven at other speeds for nearly all of the time, going either faster or slower than 50.
27.If five students answer an average of 43 questions correctly, maybe no one actually answered 43 questions correctly. Some could have answered more than that and others less.
28.Meaning of median - the middle score when all are in order by size
If there are even number of data points, take the mean of the two middle scores
29.The median is
�Sometimes most descriptive
�Mid in sensitivity, typicality
30.The median is closest to all scores using a different definition
"Closeness" = normal meaning: difference between score and central figure, regardless of sign or direction
31.The median is
�Less rich in theoretical possibilities
�Use is on the increase in modern data analysis research
�Approximately 50% of scores above and below mean, median
32.Median is the figure that separates the data into two groups.
�It is the 50th percentile
�We could calculate other "percentiles" (hundredths of the data) or "quantiles" (points marking off various fractions of the data
33.Meaning of mode
�Score that occurred most often
�"Modal" or typical person scores here
�Set can have 2 or many modes
�If all the scores are unique, we say there is no mode
34.Large sets expected to be uni-modal
�Multiple modes can be an important sign
�Has there been a mixture of two different sub-groups ?
35.Mode�s location determines "symmetry" of set of scores (when graphed)
36.The mode is the least sensitive of the three main measures
37.All three types of central tendency measures can be misleading
�Nearly all questions of "typical-ness" are affected by the nature of the group being studied.
38.Where a few values in a set are much greater or smaller, no measure of central tendency can reflect that fact by itself.
39.When two or more distinct groups are mixed together in one "population," any measure of central tendency may need to be augmented with information on the variation found in the group. That is the subject our next lesson.
40.Markov inequalitiy for mean
�Not very accurate or useful
�Max portion that is greater than K means = 1/K
41.Example: say the average height for men is 6 feet. What portion of the population of men is more than 3 means (3x6)18 feet tall?
42. Answer: Since we are using 3 means, k= 3. 1/3 or 33% is the maximum possible fraction of such a population that could be 18 feet tall.
43.Actually, we know that no known men have ever been that tall. That is why I say that the inequality is not very accurate or useful for practical purposes.
44.Most of the time, it helps to gather some information about the variance around the mean, rather than use such crude mathematical limits.
45.Effect size
�Difference between grand mean and group mean
�Important concept
�Summarizes complicated experiments
46.For example, if an equal number of boys and girls have means of 10 and 12 respectively, the grand mean is 11. The effect of being a boy is -1 and the effect of being a girl is +1.
47.The mean is also called the "expectation." The effect size is what we expect to be the result of being in a particular group.
48.Availability of the 3 measures
�All 3 possible with some data
�Median & mode with orderable data
�Mode with purely nominal or type data
49.The three measures' relations
�Can be identical
�Median is the mid-point of the data
�Mean most affected by extremes
50.Other possibilities
�Geometric mean: mass created by multiplying scores together instead of adding them. Then the mass is cut apart by taking the Nth root instead of dividing.
�Harmonic mean: reciprocals of scores are averaged. Take reciprocal of average.
51.There is a book on 70 measures of central tendency
52.After this first lesson, we will more or less ignore the median and the mode for the rest of the course.
53.Recommended exercise 1
�Find a variable of interest
�Collect some data on that variable
�Calculate the mean, median and mode
�Note differences in the three measures
54.Recommended exercise 2 - Take a look at "How to Lie with Statistics" by D. Hoff
55.Recommended exercise 3
�Look through the World Almanac or other books of numerical information
�Calculate a few averages of interest:
Average some figures listed
Average over time: given a figure for a year, what was the average per month, day or hour?
56.Summary
�All three measures of central tendency can be helpful but much traditional statistics instruction in this country focuses on the mean.
�Very different configurations of scores can produce the same mean.
57.The mean is the figure that would have produced the same total from equal contributions, if all had contributed equally.