Making Inferences from a Sample
Making inferences from a sample, or statistical inference is the process of using data analysis to infer properties of a population, for example by testing hypotheses and making estimates. It is assumed that the observed data set is sampled from a larger population.
Population – All members of the studied group
Sample – A portion of the studied group is used to represent the entire population
Random – Every member of the studied group has equal chance at selection
Census – Every member of the studied group is included
Bias – If the sample does not adequately represent the population
Error – Degree to which the results of from the sample are different from the actual results of the population
Outlier – A value that is far larger or smaller than most
Mode – Most commonly occurring value of a data set
Median – Middle value of a data set
Mean – Average value of a data set
Range – Distance between the least and greatest values of a data set
Determine Appropriate Sampling
The purpose of a sample is to gather information about a population. It can become very costly (time, money, effort) to study every member of a population, especially if there are many members in the population group or if they are difficult to study. A sample (smaller portion) of the population can be studied, but what is saved in costs is accompanied by a possible decrease in the accuracy of results. Larger samples (relative to population) increase the certainty that the results truly represent the population, as they decrease the effect of outliers on the overall data.
Random sampling is commonly recommended for statistical purposes. However, most samples are not truly random, as some members of a population are typically easier to study than others. Some common sampling techniques include cluster (members are assigned groups, and then one or more entire group is selected to represent the whole population), stratified (members are assigned groups, then a specific number or percent is selected from each group), systematic (applying a rule to determine the sample group – counting the nth member), and convenience (easiest-to-get members are selected).
Ex: Jacob’s high school has 300 males and 250 females. Jacob wants to determine the average shoe size in his high school for a statistics project. Which description of the population is best?
a. High school students
b. Elementary school students
c. Students at Jacob’s high school
d. Male students
Correct Answer: C
Ex: Jacob’s teacher said his sample should include about 25-30 students. Which sample group is best?
a. Members of the Jacob’s high school football team
b. Every 5th high school student as they enter Jacob’s school
c. Jacob’s high school girls’ volleyball team
d. The students in Jacob’s 2nd period class
Correct Answer: B
Ex: Explain a potential problem with selecting every 5th student as Jacob’s sample.
-Not every student has a chance (every 1-4 students have no chance)
-Jacob could get a sample that is not representative (too many males or females, too many freshmen, etc.)
Apply Measures of a Sample to a Population
As students enter the building, Jacob asks the shoe size of every 5th high school student. He recorded the responses in a table:
Ex. Put the responses in order, from smallest to largest.
Ex. Determine the mode, median, mean, and range.
Ex. Jacob uses his data to make a statement about the population. Which statement is best? Which statement is worst?
a. No high school student has a size 6 shoe.
b. Most students have a size bigger than 10.
c. The average shoe size of the population is between 10 and 11.
d. Females have bigger shoe sizes than males.
Choice C is the best statement and Choice D is the worst.
A – This statement is supported by the sample data, but having values above and below indicates that a larger sample would include that value.
B – This statement is not supported by the sample data (11 values were larger than 10, and 13 values were not larger than 10). However, it is close enough that a larger sample could support this statement.
C – This statement is best because it is supported by the sample, and it is unlikely that a larger sample would shift the average significantly.
D – This statement is worst because no data was collected about gender, so no statement can be made and supported.