Sample size calculation in cross-sectional studies
I often see people being a bit anxious when it comes to sample size calculations: I recently had a client having this problem when setting up a cross-sectional study. I here explain the solutions found to assist the client with this issue. Two different tools that are freely online available were used.
I often see people being a bit anxious when it comes to sample size calculations: You have to find the correct formula for the type of study you are conducting and think about the figures you need to put into the equation. I recently had a client having the same problem when setting up a cross-sectional study. As I think it might be helpful to others, I here explain the solutions found to assist the client with this issue.
Set assumptions

Suppose you are doing a cross-sectional study on the smoking prevalence among male and female university students. If you have a clear idea about the difference in the prevalence of smoking between these two groups (based on other studies), than the Open Epi website (Open Source Epidemiologic Statistics for Public Health) might be the right tool for you to do a quick sample size calculation. Go to Sample Size, then to Cohort/RCT and Enter new data. Assume you want a two-sided significance level of 95%, a power of 80%, two equal groups, and you expect the prevalence of smoking among female university students to be 35% and among males to be 50%. Then you will get the following result after entering the data:
Varying assumptions
However, if you are not so sure about the prevalences to be expected in both groups, you need to repeat this exercise for different values (actually for different effect sizes; the difference between the two prevalences). The sample sizes could then be as follows (note that here I only present the calculations based on Kelsey):

Note that the above gives the total sample size, so each group will be half (e.g. N=190, females = 95 and males = 95)
Of course you can also change the significance level, power etc.
Being busy with this, we found a program online, that can easily assist you to graphically represent the different sample sizes as displayed in the table above: G*Power 3
After downloading the program, go to the tab ‘Protocol of power analysis’ and select the following:

Then click on ‘calculate’ and you will get the sample size for the above values of the parameters. Then click on ‘ X-Y plot for a range of values’ and change the second line: As a function of ‘Effect size w’ from 0.1 in steps of 0.01 through to 0.5. This results in the following graph:

Open Epi gave a sample size of 88 when having an effect size of 30% (prevalence females 30% and males 60%) and this is in agreement with the graph, giving a sample size of 87 for the effect size of 0.3. Note that due to the scale of the graph you might not be able to accurately determine the exact sample size, but just click on the tab Table (next to Graph) and you will get the exact numbers.
If you now want to do a graph where instead of the effect size, you want to change the power (keeping the effect size the same), you change the second line as follows: As a function of ‘Power (1 – β err prob)’ from 0.6 in steps of 0.01 through to 0.95. In the graph that you get you will see that for power 0.8 you need a sample size of 87. In the graph above you see the same sample size when you go to effect size 0.3.
Unfortunately, these programs will not be able to assist you in case you have more complex sampling designs, e.g. multi-staged. In those instances it might be wise to contact a statistician specialized in doing these type of sample size calculations.
may you help me please ,
Iam working on a cross sectional study to establish normal hematological values in my country(sudan), I found difficulty to calculate the sample size.
it's so helpful.
but i found the difficulty in determining the ratio (r). my study is about the comparation of outcome between stroke patients with single risk factor and multiple risk factor. would you help me? thanks a lot before
Maria, Indonesia
Dear Maria, Normally you take an equeal number of patients in both groups, so the r would be 1. Hope this helps. Kind regards, Annette
Dear Annette,
first of all, like all the others, i found the information on this page immensely helpful, however, i had a question: i had heard in epidemiology lectures that lower the prevalence of the condition one needs to study, higher is the sample size required, whereas using the above calculator (from the Open Epi website), keeping the design effect, populaltion and CI constant, the sample size keeps on increasing as one incerases the prevalence of the problem under study till 50%. I have been wondering why that is so..any thoughts?
Dear Nitya, The calculator is correct, as the sample size increases untill a prevalence of 50% (in that case there are as many people with and without disease) and then decreases again. Please see the following link which might assist with explaining this http://www.caribvet.net/es/system/files/formation_fc_4_3asalmanvep-epi-sample20size20for20prevalence20estimation.pdf. Kind regards, Annette
pls, can u kindly help me/teach me,how to calculate the sample size required for me to achieve a statistically significant research work.Am working on the prevalence of chronic kidney disease in Lagos,Nigeria.population of lagos is presently placed at 18million and a similar study done in Kano placed the prevalence at 26%.its a descriptive study and will like to use the multi stage stratified random sampling technique.pls, bail me out as ve bin on it for over 5months.
kind regards.....
akinsiku
Dear Dr Akinsiku Adedamole, If you do a straight forward cross-sectional study to determine prevalence, you could make use the following website to determine the sample size: http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm Go to 'Sample size', 'Cohort/RCT' and fill in the required figures. However, it you want to do it multi-stage, stratified, I am not able to assist you. You should try to find a statistician who can do this for you. Good luck. Kind regards, Annette
Good day. Can you kindly put me through and help calculate the sample size for my proposed study, of assessment of thyroid function in sick neonates admitted into the neonatal intensive care unit. It is a descriptive cross-sectional study, the similar study i saw has prevalence of 43% . Your input will help me achieve a statistically significant work.
Dear Fakeye, Thanks for your message. The sample size calculation for a study on the the prevalence of thyroid function would be straightforward. I use the following website for this http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Go to sample size, proportion and fill in a proportion of 43%. Then ‘calculate’. However, I am not sure whether this is exactly your research aim. If it is more complicated, rather try a statistician that is specialised in this. Kind regards, Annette
Good day,
Can you kindly enlighten me on how to calculate the sample size for my proposed study;
Title is timing of equilibration of hemoglobin after transfusion in children.
It is a cross sectional study looking at when actually should be the standard time for checking packed cell volume after transfusing anemic patients.
In a related study the prevalence of severe anemia requiring transfusion was stated as 10-30%.
Pls kindly tell me which formula i should use, that will give me a statistically significant result.
Thanks, God bless you.
Dear Abiola, Thanks for your message. The sample size calculation for a study on the the prevalence of severe anemia would be straightforward. I use the following website for this http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Go to sample size, proportion and fill in proportions between 10 and 30%. Then ‘calculate’. However, I would not be able to assist if the focus is on determining the best time for checking packed cell volume. Rather try a statistician that is specialised in this. Kind regards, Annette
good day, i would like your advice on the appropriate sample size formula to use in a cross sectional study i want to conduct among neonates. its titled normal ecg pattern in healthy neonates in Lagos University hospital. the estimated prevalence is 50%.
i look forward to hearing from you,
many thanks,
Omolabake
Dear Omolabake, Thanks for your email. If I understand it correctly, you would like to do a study in which you want to show the prevalence of a normal ECG pattern in healthy neonates. You estimate that the prevalence will be 50%. I use the following website for this http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Go to sample size, proportion and leave everyting as it is (default is 50%). Then 'calculate'. Hope this helps. Kind regards, Annette
Good day!
I would like to know your advise as to how I can best calculate my sample size. I am doing a cross-sectional study on workers/inspectors. I am trying to see the association between those who pass out defective items and those with visual defects ( I plan to have them tested). There are no previous study on this that I can search. The total population of inspectors/workers is 1500. Pls. if you can help me.
thanks!
Dear Ana, Thanks for your message. I always use the following link to calculate sample sizes: http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. In this case you go to sample size, CC/RCT. You need to have an idea of how many have visual defects and what the prevalence of passing out defective items is in both groups. If you fill in these estimates you can calculate the required sample size. You could change the assumptions and recalculate. However, as you have no idea on these values, you should consider testing all inspectors (census). Especially because I assume the prevalence of visual defects that really lead to pass out defective items might not be that high. I hope this helps. Kind regards, Annette
Dear Annette.
I hope you can come up with some advise how to calculate a sample size in my cross sectional study.
My study is about knowledge among patients in anticoagulation therapy. I have only one group of patients (no comparison group) who is going to complete a questionnaire about their self-assessed knowlegde and their real knowledge. My goal is to investigate whether there is a correlation between their self-assessed knowledge and their real knowledge.
Can I do a sample size calculation ??
Best Regards
Maria
Dear Maria, Thanks for your message. I always use the following site for sample size calculations http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. However, this site does not allow you to calculate a sample size for a correlation between 2 variables in one group. Is this your only question? Or are you e.g. also interested in the prevalence of a certain level of self-assessed knowlegde? In the latter case you could choose for sample size, proportion, on the website and calculate the number needed to address that question. Sorry that I can't help you any further with this. Kind regards, Annette
I propose to study the prevalance, significant risk factors of cervical cancer in a state of India. I consider only cases retrospectively. There are no controls proposed in my study. can i call this study as Retrospective study or cross sectional study? How to determine the sample size for the study. It is a non-interventional study.
Regards
Srikala
Dear Srikala, I am a bit confused that you say that you don't have controls. For the risk factors I assume you will be comparing those with cervical cancer with those without. Otherwise how will you assess the risk factors? In case you do that you can determine the sample size with using the crosssectional study formula. I use the following website for that http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Kind regards, Annette
Am doing a study to validate the use of a prognostic score in our setting and annually, we have 130 patients in the unit where am doing the study. My biggest problem is calculating the sample size for my study. Kindly advise me on how to calculate a sample size for my study. Thank you
Nakandha.
Dear Nakandha, I am sorry, but I can't help you with this calculation as it is not a straighforward prevalence or incidence or effect study. Please contact a statistician to help you out. Kind regards, Annette
Dear Annette
Hi, Iam Tahani from sudan iam doing a cross sectional study regarding quality of labratory services in primary health care centers ... no previous study done on this area in sudan really i need your help in calculation of the sample size of primary health care centers from total of 434 distributed in 7 localities ............. really i need your help urgenlty
Dear Tahani, What I understand from your message is that you just want to assess the quality of lab services in PHC centres. It is not a study e.g. assessing prevalence. As it is a descriptive study you would like to have a sample strategy, as well as a sample size, that gives you a large enough group of representative sites. There is no sample size calculation for that as far as I know. You could e.g. sample 10% of all sites in all 7 areas. Or a larger % (the more you have, the better it is) if that is feasible time/budget wise. Kind regards, Annette
dear Annette , can i use this formula to calculate the PHCCs sample
n=(Nz²p(1-p)/((N-1) e²+z²p(1-p))
n= sample size
z = area under normal curve corresponding to 95% confidence level =1.96
p = since there is no available studies for quality of laboratory services at PHCCs, we will consider 50 % of health centers have very high quality of laboratory services, therefore P= 0.5
1-P = 0.5
e2= absolute precision or relative error or coefficient of variation (0.15)2
N= total number of PHCCs
n=(434× 1.96 × 1.96 × 0.5 ×0.5 )/((0.15×0.15×433)+(1.96×1.96×0.5×0.5))
n=39
thank you in advance
Dear Annette,
What formula should we use if we are to compare more than 2 subgroups of risk factor?
For example:
I would like to test whether there is association between “GP’s practice on assessing their patients’ smoking behavior (outcome / dependent variable)” and “GP’s age (risk factor / independent variable)”.
I classify the outcome into 2 level of ordinal scale: 0 = bad practice; 1 = good practice.
The age variable is classified into 4 groups, also of ordinal scale: 0 = 50 yo, based on previous studies.
How should i calculate the sample size then, since there are 4Ps (not only P1 and P2)?
Thank you in advance
Regards,
Nani, indonesia
(Very sorry for sending this message twice. Some wording error on the previous one when i was trying to explain about age classification)
Dear Nani, Thanks for your message. Indeed the programme that I normally use for calculating sample sizes does not accomodate for 4 groups. (See http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm). However, is this just one of the analyses you are interested in (comparing age groups of GPs) or is this the main research question of your study? Do you not have a primary outcome that contains 2 groups? You could base your sample size on that variable. Kind regards, Annette
hi am doing a study to assess the co-infection rates of hepatitis c and HIV viruses in intravenous drug users. i have a total of 69 samples.please advise on how to calculate the sample size.
Dear Keisha, In this case - where you just want to check what the prevalence is of co-infection - it is most important that you have a represenative sample of participants in your study. If you have any idea what the prevalence would be (from earlier studies/literature), you could calculate what the required sample size should be using the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm (go to sample size and then choose proportion). Kind regards, Annette
Hi, would you assist with determination of sample sizes. I am working on a cross sectional study in factory. I have 100 factories as the study population grouped into ten categories. I know there is a percentage to determine the number of factories that i could use as target population. Do you have any info or any reference?
Dear Jacy, Is the 'participant' the factory? Or are you using employees in the factory? In the latter case the sample size calculation is for a multi-level design and not that easy. In the first case you could use the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm for your calculation. Depending on what you looking at e.g. prevalence of something in all factories or pevalence difference between groups, you either choose at sample size the option proportion or cohort (also for cross-sectional studies). Kind regards, Annette
Dear Annette
Hi, Iam Ebrahim from yemen iam doing a cross sectional study regarding the appropriateness of indications and the diagnostic yield of EGD and colonoscopy and statification level of patients according to strict recommendition or guidlines.how can i calculate the sample size?
Dear Ebrahim, Thanks for your message. However, you don't indicate what type of study design you are using. Have a look at the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm which can assist you with calculating sample sizes for prevalence studies, cross-sectional/cohort/RCT studies, and case control studies. Good luck. Kind regards, Annette
i need the suitable formula to calculate sample size by usinge case cotronl study and i have no data in my country about the disease that i wil do.
Dear Rihab, Please go to http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and fill in the details before you say 'calculate'. If you don't have the info on exposure of the controls, try different values (same with the exposure of the cases or OR) and see how the sample sizes differ. Also keep in mind what is possible in practice. Kind regards, Annette
Hello... We are doing a multi-cross sectional research comparing customer loyalty in Sweden and Spain. Does the sample size need to be of equal size when running the t-test, like 50% equal amount of answers of both countries? Kindly, than you
Dear Jacqueline, A t-test can be performed with unequal sample sizes, but the smaller sub-sample will determine the statistical power of the test. So make sure that the smallest sample is large enough. Kind regards, Annette
dear annette,
i am doing cross sectional study on nutritional status of lower socio economic group teenage girls in pakistan and no data for prevalence of mal nutrition of this group is available.how can i calculate sample size.
please advise me.thanks
Dear Fatima, You can use the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and choose sample size, proportion and insert 50% for the prevalence if unknown. Good luck. Kind regards, Annette