Methods
Cross-tabulation in cohort and case-control studies
Cohort and case-control studies
Last time disease frequency measures were discussed, including the cumulative incidence. (1) It was indicated that this measure could be calculated from e.g. a cohort study.
Cohort studies begin with a group of people (a cohort) free of disease, who are classified into subgroups according to exposure to a potential cause of disease. Variables of interest are specified and measured and the whole cohort is followed up to see how the subsequent development of new cases of the disease (or other outcome) differs between the groups with and without exposure.
For example, if you suspect there might be a causal relationship between the use of a certain water source and the incidence of diarrhoea among children under five in a village with different water sources you could conduct a cohort study. You select a group of children under five years and classify them as either using the suspected source of water or using other water sources. You check e.g. after two weeks whether the children have had diarrhoea. You can then calculate how many diarrhoea cases there were among those children using the suspected water source and those using other sources of water supply (cumulative incidence of diarrhoea).
The same problem could also be studied in a case-control study. A case-control study begins with the selection of cases (people with a disease), which should represent all the cases from a specified population. The controls/references (people without the disease) should represent people who would have been designated study cases if they had developed the disease (population at risk). Then the exposure status is determined for both cases and controls and the occurrence of the possible cause of the disease could then be calculated for both the cases and controls. To come back to the example, you may compare children who present themselves at a health centre with diarrhoea (cases) during a particular period of time with children presenting themselves with other complaints of roughly the same severity, for example acute respiratory infections (controls) during the same time. Then you determine which source of drinking water they had used and see what proportion of cases and controls were exposed to the suspected water source.
Cross-tabulation in cohort studies
Assume you have just conducted the cohort study described above. How do you actually do the cross-tabulation in a statistical program (e.g. SPSS) to calculate the cumulative incidence in both groups.
Best is to always put the outcome variable (disease yes/no) in the columns and the exposure variable in the rows. In other words, put the dependent variable, that is the variable that is used to describe the problem under study, in the columns and the independent variable, that is the variable that is used to describe the factor that is assumed to cause the problem, in the rows. In the example the variable diarrhoea (yes/no) should be in the columns and the variable water source (suspected/other) in the rows. SPSS will put the lowest value of the variable in the first column or row. So in order to get those with diarrhoea in the first column you should label ‘diarrhoea’ as 1 and ‘no diarrhoea’ as 2. The same is true for the exposure variable, so label the ‘suspected water source’ as 1 and the ‘other water source’ as 2.
You will then be able to calculate the cumulative incidence (risk of developing the disease) among those with the exposure: a / (a + b) (Table 1). As well as the cumulative incidence among those without the exposure: c / (c + d).
Table 1. Cross-tabulation in a cohort study.
In the case of the diarrhoea study (Table 2), you could calculate the cumulative incidence of diarrhoea among those exposed to the suspected water source, which would be (78 / 1 5000 =) 5.2%, and among those exposed to other water sources, which would be (50 / 1 000 =) 5.0%. SPSS can give you these percentages immediately (in cell ‘a’ and ‘c’ respectively), when you ask to display row percentages in the Cells option.
Table 2. Results of a cohort study on diarrhoea incidence among children under five using different water sources.
Cross-tabulation in case-control studies
When you have used the case-control design for the diarrhoea study, the actual cross-tabulation is quite similar. You put the disease status (case/control) in the columns and the exposure variable in the rows. Label the cases as 1, and the controls as 2. Be aware that row percentages have no meaning in terms of occurrence of disease in case-control studies. This is due to the fact that in case-control studies the researcher determines how many patients and how many controls are included. The ratio between the number of patients and controls (e.g. 2 : 1 or 4 : 1) influences the row percentages. So in a case-control study, the cumulative incidence cannot be calculated. When having conducted a case-control study, you can ask to display column percentages. That gives you the proportion of those exposed to the suspected water source among the cases (in cell ‘a’) and among the controls (in cell ‘b’). Table 3 displays the cross-tabulation in a case-control study.
Table 3. Cross-tabulation in a case-control study.
Table 4 gives a numerical example for the diarrhoea study with the case-control design. Using the data provided, (28 / 44 =) 64% of the cases were exposed to the suspected water source, while this was (29 / 237 =) 12% of the controls.
Table 4. Results of a case-control study on diarrhoea incidence among children under five using different water sources.
Next time, an article will be devoted to measures of association: How do you actually compare cumulative incidence rates in cohort studies? And what measure of association can be used in case-control studies?
References:
- Measuring disease frequency: epidemiological studies and routine data. http://www.epiresult.com/methods/10/how-to-measure-disease-frequency.html Accessed May 17, 2010.
Consultancy services
Helps you to conduct better research.
Epidemiology courses
Custom-made group courses for professionals in the field.







