Research Methods:
Populations and Samples
Background Definitions
-
universe of discourse - a collection of actual or hypothetical items
-
defined by some common characteristic
-
to which the question/concern, and the answer, are to apply
-
these items are called units
-
aggregate of all the units is termed the population
-
e.g., human population of a country, county, etc.; all elementary schools
in a district; all Concorde take-offs from a certain airport in a study
of aircraft noise
-
sometimes populations can be enumerated (i.e., individual units
can be identified and listed)
-
often not feasible, perhaps because a human aggregate changes too rapidly
over time (e.g., homeless of a large city)
-
a population frequently may be conceptually infinite in size
-
because examining every unit of a population impossible or inefficient,
we consider sampling
-
definition: the selection of some units of a population to represent the
whole aggregate
-
the key is to make sure sample is as typical as possible in relation to
the objectives of the study
Advantages of Sampling
-
if population is well mixed, then it need not be enumerated
-
e.g., blood sample from arm
-
but a cup of grain from hopper of combine harvester to measure protein
content for the whole field is debatable
-
quality control in industrial production usually demands a sampling approach
(e.g., checking lifetimes of batteries would leave them unusable)
-
sampling is generally cheaper
-
funds can then be used for (1) better methods of data collection and validation,
(2) more experienced staff, etc.
-
results: (1) quality of information collected is higher, and (b) possible
to obtain more information per unit
-
it may be possible to collection more information per person because people
more cooperative when they feel themselves specially chosen
-
the data generally available for analysis more rapidly with the result
that the analysis can be completed more quickly, and the findings are less
likely to be out of date
Problems of Sampling
-
main disadvantage: researcher must tolerate a greater degree of uncertainty
-
causes of this uncertainty are (1) the natural variation due to
chance differences among members of the population, and (2) an inadequate
definition of the population or shortcomings in the method of choosing
the sample
-
errors of this kind give rise to a component of uncertainty called bias
-
these errors include: (1) subjective choice of a sample (e.g., sexual or
racial predisposition), (2) sampling from an inadequate list (e.g., use
of a phone book doesn't reflect the whole population), (3) incomplete response
in postal surveys (e.g., respondents may be more interested in the objectives
of the study, (4) substitution in the sample (e.g., interviewer may take
next-door household whenever there is no reply; as a result, a preponderance
of houses often occupied will be included), and (5) "edge effects" (e.g.,
student attendees at the start of a term aren't typical of the whole year)
-
bias can be eliminated by
-
ensuring that the required population is indeed the one sampled
-
choosing a random sample; i.e., one in which every member of the
sample population is guaranteed an equal chance of being included in the
sample
Choosing a Random Sample
-
to select a random sample of size n, it is necessary to
-
list all units of the population, numbered 1 to N
-
have a mechanism for selecting n different numbers from the range 1 to
N
-
methods of achieving a random sample include
-
drawing from a hat
-
spinning a roulette wheel with N slots
-
tables of previously generated pseudo-random digits
-
the use of a random numbers table involves
-
a fixed starting point
-
reading down or across
-
continuing until a sample size reached
-
ways of achieving a random starting point include
-
close eyes and imprint page with a pin
-
if same table is always used, carry on from the point where last use was
terminated