Senin, 16 Desember 2013

Sampling and sampling distributions

 Sumber : Book Petrie&Sabine, Medical Statistics at a Glance

Why do we sample?
In statistics, a population represents the entire group of individuals in whom we are interested. Generally it is costly and labourintensive to study the entire population and, in some cases, may be impossible because the population may be hypothetical (e.g.patients who may receive a treatment in the future). Therefore we collect data on a sample of individuals who we believe are representative of this population (i.e. they have similar characteristics to the individuals in the population), and use them to draw conclusions (i.e. make inferences) about the population. When we take a sample of the population, we have to recognize that the information in the sample may not fully reflect what is true in the population. We have introduced sampling error by studying only some of the population. In this chapter we show how to use theoretical probability distributions (Chapters 7 and 8) to quantify this error.

Obtaining a representative sample
Ideally, we aim for a random sample. Alist of all individuals from the population is drawn up (the sampling frame), and individuals are selected randomly from this list, i.e. every possible sample of a given size in the population has an equal probability of being chosen. Sometimes, we may have difficulty in constructing this list or the costs involved may be prohibitive, and then we take a convenience sample. For example, when studying patients with a particular clinical condition, we may choose a single hospital, and investigate some or all of the patients with the condition in that hospital. Very occasionally, non-random schemes, such as quota
sampling or systematic sampling, may be used. Although the statistical tests described in this book assume that individuals are selected for the sample randomly, the methods are generally reasonable as long as the sample is representative of the population.

Point estimates
We are often interested in the value of a parameter in the population (Chapter 7), e.g. a mean or a proportion. Parameters are usually denoted by letters of the Greek alphabet. For example, we usually
refer to the population mean as m and the population standard deviation as s. We estimate the value of the parameter using the data collected from the sample. This estimate is referred to as the sample statistic and is a point estimate of the parameter (i.e. it takes a single value) as opposed to an interval estimate (Chapter 11) which takes a range of values.

Sampling variation
If we take repeated samples of the same size from a population, it is unlikely that the estimates of the population parameter would be exactly the same in each sample. However, our estimates should all be close to the true value of the parameter in the population, and the estimates themselves should be similar to each other. By quantifying the variability of these estimates, we obtain information on the precision of our estimate and can thereby assess the sampling error. In reality, we usually only take one sample from the population.
However, we still make use of our knowledge of the theoretical distribution of sample estimates to draw inferences about the population parameter.

Sampling distribution of the mean
Suppose we are interested in estimating the population mean; we could take many repeated samples of size n from the population, and estimate the mean in each sample. Ahistogram of the estimates of these means would show their distribution (Fig. 10.1); this is the sampling distribution of the mean. We can show that:
• If the sample size is reasonably large, the estimates of the mean follow a Normal distribution, whatever the   distribution of the original data in the population (this comes from a theorem known as the Central Limit Theorem).
• If the sample size is small, the estimates of the mean follow a Normal distribution provided the data in the population follow a Normal distribution.
• The mean of the estimates is an unbiased estimate of the true mean in the population, i.e. the mean of the estimates equals the true population mean.
• The variability of the distribution is measured by the standard deviation of the estimates; this is known as the standard error of the mean (often denoted by SEM). If we know the population standard deviation (s), then the standard error of the mean is given by:
When we only have one sample, as is customary, our best estimate of the population mean is the sample mean, and because we rarely know the standard deviation in the population, we estimate the standard
error of the mean by:
where s is the standard deviation of the observations in the sample (Chapter 6). The SEM provides a measure of the precision of our estimate.

Interpreting standard errors
• Alarge standard error indicates that the estimate is imprecise.
• Asmall standard error indicates that the estimate is precise.
The standard error is reduced, i.e. we obtain a more precise estimate, if:
• the size of the sample is increased (Fig. 10.1);
• the data are less variable.

SD or SEM?
Although these two parameters seem to be similar, they are used for different purposes. The standard deviation describes the variation in the data values and should be quoted if you wish to illustrate variability
in the data. In contrast, the standard error describes the precision of the sample mean, and should be quoted if you are interested in the mean of a set of data values.
SEM =s n
SEM =s n
10 Sampling and sampling distributions Sampling and sampling distributions Sampling and estimation 27

Sampling distribution of the proportion
We may be interested in the proportion of individuals in a population who possess some characteristic. Having taken a sample of size n from the population, our best estimate, p, of the population proportion, p, is given by:
where r is the number of individuals in the sample with the characteristic. If we were to take repeated samples of size n from our population and plot the estimates of the proportion as a histogram, the p = r n
resulting sampling distribution of the proportion would approximate a Normal distribution with mean value, p. The standard deviation of this distribution of estimated proportions is the standard error of the proportion. When we take only a single sample, it is estimated by:
This provides a measure of the precision of our estimate of p; a small standard error indicates a precise estimate.

Pages

Diberdayakan oleh Blogger.