# Random sampling versus randomisation

I recently examined a MPH thesis in which the student stated that “the intervention and control were assigned using a random sampling technique.” I have noted in the past that students mix-up random sampling and randomization. I therefore explain both concepts together in this article.

## Random sampling** **

*Sampling* is the process of selecting a number of people from a larger population. Some populations include only small numbers of people and thus all people in the population can be included (this is called a census). Often, however, research focuses on such a large population that, for practical reasons, it is only possible to include some of its members in the study. We then have to draw a *sample* from the total population.

The study population has to be clearly defined (for example, according to age and sex), otherwise we cannot do the sampling.

If researchers want to draw conclusions which are valid for the whole study population they should take care to draw a sample in such a way that it is representative of that population. This is especially important for descriptive and cross-sectional studies as you aim to describe some characteristics of a population (and not always that important for case-control studies, cohort and RCTs as the focus is on comparing groups). A representative sample has all the important characteristics of the population from which it is drawn.

The figure below illustrates a hypothetical population composed of three different types of people (○ ● □). A representative sample is a precise miniaturized representation of the proportion of different types of people in the population. An unrepresentative (or biased) sample does not represent the different types of people in correct proportions, leading to wrong conclusions about the state of the population.

*Source: Polgar S. Introduction to Research in the Health Sciences (6th edition). Churchill Livingstone; 2013.*

For example, if you intend to interview 100 mothers in order to obtain a complete picture of the immunisation practices in uMgungundlovu District you would have to select these from a representative sample of villages. It would be unwise to select them from only one or two villages as this might give you the wrong (biased) picture. It would also be unwise to only interview mothers who attend the post-natal clinic, as those who do not attend this clinic are probably the ones not having their children immunised.

So how do you get a representative sample? *Probability sampling* involves using random selection procedures to ensure that each person in the sample is chosen on the basis of chance. All people in the study population should have an equal chance of being included in the sample.

Probability sampling requires a list of all people in the population. This list is called the *sampling frame *(e.g. a class list, a map of a village with all the households)*.*

There are different methods for probability sampling:

*Simple random sampling*: Simplest form of probability sampling. Selecting random numbers can be done manually as in the example, but also by generating random numbers on the computer. For example, a simple random sample of 50 babies is to be selected from a hospital ward of 250 babies. Using a list of all 250 babies, each baby is given a number (1 to 250), and these numbers are written on small pieces of paper. All the 250 papers are put in a box, after which the box is shaken. Then, 50 papers are taken out of the box, and the numbers are recorded. The babies belonging to these numbers will constitute the sample.*Systematic sampling:*Usually less time consuming and easier to perform than simple random sampling. The complete population need not to be known before you start to select the sample (e.g. if you interview every 5^{th}person that comes in the waiting area of the clinic, people can continue to come in).

For example, a systematic sample of 50 babies is to be selected from a hospital ward of 250 babies. The sampling interval is 250 (study population)/50 (sample size) is 5. The number of the first baby to be included in the sample is chosen randomly, for example by blindly picking one out of 5 pieces of paper, numbered 1 to 5. If number 3 is picked, then every fifth baby on the list will be included in the sample, starting with child number 3, until 50 babies are selected: the numbers selected would be 3, 8, 13, 18, etc.*Stratified sampling:*The simple random/systematic sampling methods have as disadvantage that small groups in which the researcher is interested may hardly appear in the sample. If it is important that the sample includes representative study units of small groups with specific characteristics (for example, babies from wealthy households), then the sampling frame must be divided into groups (strata), according to these characteristics. Random or systematic samples of a pre-determined size will then have to be obtained from each group (stratum).

For example, a stratified sample of 50 babies is to be selected from a hospital ward of 250 babies of which 20% come from wealthy households and 80% from poor households. It is suspected that the health of babies from healthy households is different (better) than that of babies in poor households. Therefore it is decided to do the sampling (either simple or systematic) separate for the wealthy and poor babies. Of the 50 wealthy babies 10 are sampled and of the 200 poor babies, 40 are sampled. The total sample will be 50.

Note that with a cluster randomized controlled trial you can do a simple random sample or a systematic sample of clusters of people (e.g. a village or a hospital ward) instead of sampling individuals. This is called *cluster sampling*.

**Randomization**

A Randomized Controlled Trial (RCT) is an epidemiological experiment to study a new preventive intervention (e.g. a vaccine or behaviour change intervention) or therapeutic regimen. RCTs are meant to assess the value of new preventive strategies or therapies in order to make efficient use of health-care resources.

A RCT is a prospective study designed to establish a causal relationship between an exposure and outcome. The researcher begins with individual who are free of the outcome of interest. The study population is screened for eligibility. All subjects in a trial must meet the specified inclusion criteria for the condition under investigation, and other criteria are usually specified to ensure a reasonably similar group of subjects. Those without the outcome of interest and who fulfil the inclusion criteria are invited to participate.

Then *randomization* takes place: To ensure that the groups being compared (treatment and control group) are equivalent (at the start of the investigation), patients are allocated to them randomly, i.e. by chance. By randomisation the researcher eliminates the effect of confounding variables through the equal distribution of confounders (both known and unknown) in the treatment and control groups. Also selection bias is prevented; it prevents participants, clinicians or researchers from using their subjective judgment to decide into which group to place participants. Assuming that the randomisation worked, the researcher can attribute any differences in disease outcomes at the end of the study to the intervention being tested. RCTs are therefore the strongest method to establish a causal relationship between an exposure and an outcome, and hence considered to be the gold standard.

Randomization is the method used to generate a random allocation sequence, such as using tables of random numbers or computer-generated random sequences. Not adequate methods are alternating, ID number, date of birth, day of the week as these are not truly random.

So in conclusion, a random sample of a population is a way to get a representative smaller group out of a larger group, while randomization is a way to allocate an intervention and a control treatment to an individual by chance (to ensure that both groups are equal).