# Sample size calculation in cross-sectional studies

I often see people being a bit anxious when it comes to sample size calculations: I recently had a client having this problem when setting up a cross-sectional study. I here explain the solutions found to assist the client with this issue. Two different tools that are freely online available were used.

I often see people being a bit anxious when it comes to sample size calculations: You have to find the correct formula for the type of study you are conducting and think about the figures you need to put into the equation. I recently had a client having the same problem when setting up a cross-sectional study. As I think it might be helpful to others, I here explain the solutions found to assist the client with this issue.

## Set assumptions

Suppose you are doing a cross-sectional study on the smoking prevalence among male and female university students. If you have a clear idea about the difference in the prevalence of smoking between these two groups (based on other studies), than the Open Epi website (Open Source Epidemiologic Statistics for Public Health) might be the right tool for you to do a quick sample size calculation. Go to Sample Size, then to Cohort/RCT and Enter new data. Assume you want a two-sided significance level of 95%, a power of 80%, two equal groups, and you expect the prevalence of smoking among female university students to be 35% and among males to be 50%. Then you will get the following result after entering the data:

## Varying assumptions

However, if you are not so sure about the prevalences to be expected in both groups, you need to repeat this exercise for different values (actually for different effect sizes; the difference between the two prevalences). The sample sizes could then be as follows (note that here I only present the calculations based on Kelsey):

*Note that the above gives the total sample size, so each group will be half (e.g. N=190, females = 95 and males = 95)*

Of course you can also change the significance level, power etc.

Being busy with this, we found a program online, that can easily assist you to graphically represent the different sample sizes as displayed in the table above: G*Power 3

After downloading the program, go to the tab ‘Protocol of power analysis’ and select the following:

Then click on ‘calculate’ and you will get the sample size for the above values of the parameters. Then click on ‘ X-Y plot for a range of values’ and change the second line: As a function of ‘Effect size w’ from 0.1 in steps of 0.01 through to 0.5. This results in the following graph:

Open Epi gave a sample size of 88 when having an effect size of 30% (prevalence females 30% and males 60%) and this is in agreement with the graph, giving a sample size of 87 for the effect size of 0.3. Note that due to the scale of the graph you might not be able to accurately determine the exact sample size, but just click on the tab Table (next to Graph) and you will get the exact numbers.

If you now want to do a graph where instead of the effect size, you want to change the power (keeping the effect size the same), you change the second line as follows: As a function of ‘Power (1 – β err prob)’ from 0.6 in steps of 0.01 through to 0.95. In the graph that you get you will see that for power 0.8 you need a sample size of 87. In the graph above you see the same sample size when you go to effect size 0.3.

Unfortunately, these programs will not be able to assist you in case you have more complex sampling designs, e.g. multi-staged. In those instances it might be wise to contact a statistician specialized in doing these type of sample size calculations.

may you help me please ,

Iam working on a cross sectional study to establish normal hematological values in my country(sudan), I found difficulty to calculate the sample size.

it's so helpful.

but i found the difficulty in determining the ratio (r). my study is about the comparation of outcome between stroke patients with single risk factor and multiple risk factor. would you help me? thanks a lot before

Maria, Indonesia

Dear Maria, Normally you take an equeal number of patients in both groups, so the r would be 1. Hope this helps. Kind regards, Annette

Dear Annette,

first of all, like all the others, i found the information on this page immensely helpful, however, i had a question: i had heard in epidemiology lectures that lower the prevalence of the condition one needs to study, higher is the sample size required, whereas using the above calculator (from the Open Epi website), keeping the design effect, populaltion and CI constant, the sample size keeps on increasing as one incerases the prevalence of the problem under study till 50%. I have been wondering why that is so..any thoughts?

Dear Nitya, The calculator is correct, as the sample size increases untill a prevalence of 50% (in that case there are as many people with and without disease) and then decreases again. Please see the following link which might assist with explaining this http://www.caribvet.net/es/system/files/formation_fc_4_3asalmanvep-epi-sample20size20for20prevalence20estimation.pdf. Kind regards, Annette

pls, can u kindly help me/teach me,how to calculate the sample size required for me to achieve a statistically significant research work.Am working on the prevalence of chronic kidney disease in Lagos,Nigeria.population of lagos is presently placed at 18million and a similar study done in Kano placed the prevalence at 26%.its a descriptive study and will like to use the multi stage stratified random sampling technique.pls, bail me out as ve bin on it for over 5months.

kind regards.....

akinsiku

Dear Dr Akinsiku Adedamole, If you do a straight forward cross-sectional study to determine prevalence, you could make use the following website to determine the sample size: http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm Go to 'Sample size', 'Cohort/RCT' and fill in the required figures. However, it you want to do it multi-stage, stratified, I am not able to assist you. You should try to find a statistician who can do this for you. Good luck. Kind regards, Annette

Good day. Can you kindly put me through and help calculate the sample size for my proposed study, of assessment of thyroid function in sick neonates admitted into the neonatal intensive care unit. It is a descriptive cross-sectional study, the similar study i saw has prevalence of 43% . Your input will help me achieve a statistically significant work.

Dear Fakeye, Thanks for your message. The sample size calculation for a study on the the prevalence of thyroid function would be straightforward. I use the following website for this http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Go to sample size, proportion and fill in a proportion of 43%. Then ‘calculate’. However, I am not sure whether this is exactly your research aim. If it is more complicated, rather try a statistician that is specialised in this. Kind regards, Annette

Good day,

Can you kindly enlighten me on how to calculate the sample size for my proposed study;

Title is timing of equilibration of hemoglobin after transfusion in children.

It is a cross sectional study looking at when actually should be the standard time for checking packed cell volume after transfusing anemic patients.

In a related study the prevalence of severe anemia requiring transfusion was stated as 10-30%.

Pls kindly tell me which formula i should use, that will give me a statistically significant result.

Thanks, God bless you.

Dear Abiola, Thanks for your message. The sample size calculation for a study on the the prevalence of severe anemia would be straightforward. I use the following website for this http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Go to sample size, proportion and fill in proportions between 10 and 30%. Then ‘calculate’. However, I would not be able to assist if the focus is on determining the best time for checking packed cell volume. Rather try a statistician that is specialised in this. Kind regards, Annette

good day, i would like your advice on the appropriate sample size formula to use in a cross sectional study i want to conduct among neonates. its titled normal ecg pattern in healthy neonates in Lagos University hospital. the estimated prevalence is 50%.

i look forward to hearing from you,

many thanks,

Omolabake

Dear Omolabake, Thanks for your email. If I understand it correctly, you would like to do a study in which you want to show the prevalence of a normal ECG pattern in healthy neonates. You estimate that the prevalence will be 50%. I use the following website for this http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Go to sample size, proportion and leave everyting as it is (default is 50%). Then 'calculate'. Hope this helps. Kind regards, Annette

Good day!

I would like to know your advise as to how I can best calculate my sample size. I am doing a cross-sectional study on workers/inspectors. I am trying to see the association between those who pass out defective items and those with visual defects ( I plan to have them tested). There are no previous study on this that I can search. The total population of inspectors/workers is 1500. Pls. if you can help me.

thanks!

Dear Ana, Thanks for your message. I always use the following link to calculate sample sizes: http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. In this case you go to sample size, CC/RCT. You need to have an idea of how many have visual defects and what the prevalence of passing out defective items is in both groups. If you fill in these estimates you can calculate the required sample size. You could change the assumptions and recalculate. However, as you have no idea on these values, you should consider testing all inspectors (census). Especially because I assume the prevalence of visual defects that really lead to pass out defective items might not be that high. I hope this helps. Kind regards, Annette

Dear Annette.

I hope you can come up with some advise how to calculate a sample size in my cross sectional study.

My study is about knowledge among patients in anticoagulation therapy. I have only one group of patients (no comparison group) who is going to complete a questionnaire about their self-assessed knowlegde and their real knowledge. My goal is to investigate whether there is a correlation between their self-assessed knowledge and their real knowledge.

Can I do a sample size calculation ??

Best Regards

Maria

Dear Maria, Thanks for your message. I always use the following site for sample size calculations http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. However, this site does not allow you to calculate a sample size for a correlation between 2 variables in one group. Is this your only question? Or are you e.g. also interested in the prevalence of a certain level of self-assessed knowlegde? In the latter case you could choose for sample size, proportion, on the website and calculate the number needed to address that question. Sorry that I can't help you any further with this. Kind regards, Annette

I propose to study the prevalance, significant risk factors of cervical cancer in a state of India. I consider only cases retrospectively. There are no controls proposed in my study. can i call this study as Retrospective study or cross sectional study? How to determine the sample size for the study. It is a non-interventional study.

Regards

Srikala

Dear Srikala, I am a bit confused that you say that you don't have controls. For the risk factors I assume you will be comparing those with cervical cancer with those without. Otherwise how will you assess the risk factors? In case you do that you can determine the sample size with using the crosssectional study formula. I use the following website for that http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Kind regards, Annette

Am doing a study to validate the use of a prognostic score in our setting and annually, we have 130 patients in the unit where am doing the study. My biggest problem is calculating the sample size for my study. Kindly advise me on how to calculate a sample size for my study. Thank you

Nakandha.

Dear Nakandha, I am sorry, but I can't help you with this calculation as it is not a straighforward prevalence or incidence or effect study. Please contact a statistician to help you out. Kind regards, Annette

Dear Annette

Hi, Iam Tahani from sudan iam doing a cross sectional study regarding quality of labratory services in primary health care centers ... no previous study done on this area in sudan really i need your help in calculation of the sample size of primary health care centers from total of 434 distributed in 7 localities ............. really i need your help urgenlty

Dear Tahani, What I understand from your message is that you just want to assess the quality of lab services in PHC centres. It is not a study e.g. assessing prevalence. As it is a descriptive study you would like to have a sample strategy, as well as a sample size, that gives you a large enough group of representative sites. There is no sample size calculation for that as far as I know. You could e.g. sample 10% of all sites in all 7 areas. Or a larger % (the more you have, the better it is) if that is feasible time/budget wise. Kind regards, Annette

dear Annette , can i use this formula to calculate the PHCCs sample

n=(Nz²p(1-p)/((N-1) e²+z²p(1-p))

n= sample size

z = area under normal curve corresponding to 95% confidence level =1.96

p = since there is no available studies for quality of laboratory services at PHCCs, we will consider 50 % of health centers have very high quality of laboratory services, therefore P= 0.5

1-P = 0.5

e2= absolute precision or relative error or coefficient of variation (0.15)2

N= total number of PHCCs

n=(434× 1.96 × 1.96 × 0.5 ×0.5 )/((0.15×0.15×433)+(1.96×1.96×0.5×0.5))

n=39

thank you in advance

Dear Annette,

What formula should we use if we are to compare more than 2 subgroups of risk factor?

For example:

I would like to test whether there is association between “GP’s practice on assessing their patients’ smoking behavior (outcome / dependent variable)” and “GP’s age (risk factor / independent variable)”.

I classify the outcome into 2 level of ordinal scale: 0 = bad practice; 1 = good practice.

The age variable is classified into 4 groups, also of ordinal scale: 0 = 50 yo, based on previous studies.

How should i calculate the sample size then, since there are 4Ps (not only P1 and P2)?

Thank you in advance

Regards,

Nani, indonesia

(Very sorry for sending this message twice. Some wording error on the previous one when i was trying to explain about age classification)

Dear Nani, Thanks for your message. Indeed the programme that I normally use for calculating sample sizes does not accomodate for 4 groups. (See http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm). However, is this just one of the analyses you are interested in (comparing age groups of GPs) or is this the main research question of your study? Do you not have a primary outcome that contains 2 groups? You could base your sample size on that variable. Kind regards, Annette

hi am doing a study to assess the co-infection rates of hepatitis c and HIV viruses in intravenous drug users. i have a total of 69 samples.please advise on how to calculate the sample size.

Dear Keisha, In this case - where you just want to check what the prevalence is of co-infection - it is most important that you have a represenative sample of participants in your study. If you have any idea what the prevalence would be (from earlier studies/literature), you could calculate what the required sample size should be using the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm (go to sample size and then choose proportion). Kind regards, Annette

Hi, would you assist with determination of sample sizes. I am working on a cross sectional study in factory. I have 100 factories as the study population grouped into ten categories. I know there is a percentage to determine the number of factories that i could use as target population. Do you have any info or any reference?

Dear Jacy, Is the 'participant' the factory? Or are you using employees in the factory? In the latter case the sample size calculation is for a multi-level design and not that easy. In the first case you could use the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm for your calculation. Depending on what you looking at e.g. prevalence of something in all factories or pevalence difference between groups, you either choose at sample size the option proportion or cohort (also for cross-sectional studies). Kind regards, Annette

Dear Annette

Hi, Iam Ebrahim from yemen iam doing a cross sectional study regarding the appropriateness of indications and the diagnostic yield of EGD and colonoscopy and statification level of patients according to strict recommendition or guidlines.how can i calculate the sample size?

Dear Ebrahim, Thanks for your message. However, you don't indicate what type of study design you are using. Have a look at the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm which can assist you with calculating sample sizes for prevalence studies, cross-sectional/cohort/RCT studies, and case control studies. Good luck. Kind regards, Annette

i need the suitable formula to calculate sample size by usinge case cotronl study and i have no data in my country about the disease that i wil do.

Dear Rihab, Please go to http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and fill in the details before you say 'calculate'. If you don't have the info on exposure of the controls, try different values (same with the exposure of the cases or OR) and see how the sample sizes differ. Also keep in mind what is possible in practice. Kind regards, Annette

Hello... We are doing a multi-cross sectional research comparing customer loyalty in Sweden and Spain. Does the sample size need to be of equal size when running the t-test, like 50% equal amount of answers of both countries? Kindly, than you

Dear Jacqueline, A t-test can be performed with unequal sample sizes, but the smaller sub-sample will determine the statistical power of the test. So make sure that the smallest sample is large enough. Kind regards, Annette

dear annette,

i am doing cross sectional study on nutritional status of lower socio economic group teenage girls in pakistan and no data for prevalence of mal nutrition of this group is available.how can i calculate sample size.

please advise me.thanks

Dear Fatima, You can use the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and choose sample size, proportion and insert 50% for the prevalence if unknown. Good luck. Kind regards, Annette

Thank you for this enriched information, but I couldn't see any suitable calculation for my study. I am doing a cross-sectional study to determine the Quality Improvement at Health Profession Institute in Yemen. There are one main campus and 10 branches. What is my sample size ?. and how I can determine the sample size in each targeted category eg. students, staff or stakeholders.

Thank you

Dear Sir,

One of my student wants to do a cross sectional study on Lipid changes in shift workers. But in the previous studies the values are mentioned in mean, now how to calculate sample size, because we need values in percentage to calculate sample size.

Regards

Dear, Assuming you do a describtive study, you can use the formula under sample size, proportion on the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and use 50% as the prevalence value (as you should assume that you do not have evidence on the size yet). Annette

Dear Annette, by using cross section study, we are assessing the risk of thromboembolism in Hospitalized Patients in surgery, orthopedic and medical wards based on Caprini scoring, as it is descriptive study how should i describe my variables, %, Proportion, Mean or should I have P value for the same group

Dear Hiwa, I am not sure what your research questions is. If you want to compare the risk of thromboembolisme using the Caprini scoring (which looks to me a continuous variable) between 3 groups of patients (surgery, orthopedic and medical wards) you should do an Anova test and see if the p value is significant (hence there is a different risk between the groups). If you want to look at all hospitalized patients together and get a 'prevalence' of at risk patients (using a cut-off point of the Caprini scoring) then you just present the proportion of those at risk out of the whole group. Hope this helps. Kind regards, Annette

Hi,

I am doing a prediction study, more specific an correlation and regression analysis. my study is to compare physical and psychological variable in dialysis patients. I need help as to how i should determine my sample size.

Dear Emilda, The information on your study (design, main outcome of interest) is limited, but I find the following website very usefull to do sample size calculations http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm Hope this helps. Kind regards, Annette

Dear Annette,

Thank you for your speedy reply. my sample is a homogeneous sample involving Hemo-dialysis patients and I will be comparing their physical variables like level of blood serotonin etc with psycho social variables like illness intrusiveness, Quality of life, diet etc. i would like to know is it sufficient to predict sample by proportion method or do a mean difference method from the first 20 samples. i am actually collecting samples from various cities and towns that are though out the state.

i read your responses and they are very helpful.

I am studying MPH in Jimma university in ethiopia and now i am goin to start my thesis on Tuberclosis infection prevention practice in health facilities and i feel difficulty to deternine sample size of the facilities. thanks for your healp

Dear Sultan Ebrahim, As this is not an epidemiological study for which you need to calculate a number of patients required to show e.g. an association, it is here most important that you have a random sample of all health facilities in the region or country (e.g. 10%). Annette

i am doing a cross sectional study on utilisation of reproductive health services by women aged 15-49 years. how do i calculate the sample size

hi

I'm doing a study and i'm not sure which type is it ??!!

i'm going to compare between a disease and control group and only will do a questionnaire and make dental caries examination....and also do not no the sample size plz help me>>>

Dear Noor, are you comparing people which are sick versus healthy controls and want to know if dental health is different between the group as a consequence of disease? Then you have a cohort study in which the disease is an exposure and the dental health is the result. Sample size calculation can easily be done using the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm Kind regards, Annette

thank you very much it was help full

but i also will compare between treatment that the patients has been take to treat the disease they have to see their effect on their dental health and they will be 5 group so it will also called cohort ???

thanks for your kindness

Hi,

I want to make a prevelance study on type diabetes patients. No. of Patients attend the clinic per month is about 200. so How can i calculate the sample size for this study.

Dear Raja,

Dear Sir, please kindly put me through on calculating the sample size of my research. It is about the knowledge, perception and practice relating to HIV/AIDS at the time of booking among antenatal attendees. What prevalence should i go for, I used the prevalence of knowledge relating to HIV/AIDS among antenatal attendees and i discovered, the higher the prevalence, the smaller my sample size. I want a large sample size, how do I adjust?

Thanks, Sally

Dear Sally, If you have one group in which you want to assess the prevalence of something, it is clear that when the outcome is more prevalent, it is easier to find it in a smaller group. You in general want an optimum sample size; just enough persons to show what you want to show, because more persons will lead to increased costs. So therefore you actually do a sample size calculation. If you want to include a larger group than the sample size calculation gives you based on your expected prevalence, that is no problem. It will increase the precision of you prevalence estimate (narrower confidence interval). Kind regards, Annette

Hi,

can you please assist me. If I wish to calculate the BMI of my country of 280,000 where the prevalence of overweight and obese is 16.8 in males and 32% in females. How big a sample size is required?

I was trying to use openepi but am unsure what to put in 'Percent of Unexposed with Outcome:' row.

Thanks for your help

Dear Anique, As the prevalence is lower in males compared to females, you should regard the females as exposed and the males as non-exposed; so insert both prevalence rates. Kind regards, Annette

Dear sir/madam,

I am going to study the status of EmONC among health facilities within the zone.There are 102 HCs,3 hospitals and 4 private clinics in the zone which are providing maternity services.My study design is cross sectional study.The expected outcomes are EmONC or Non-EmONC facilities.I want to apply statistical soft wares like SPSS for analysis.So,could you help me in providing any formulas or guides to calculate the recommended sample size and which analysis method is appropriate for my work?

Kind Regards,

Alex

Dear Alemayehu, What I understand is that you want to determine whether health facilities within a certain area are EmONC or Non-EmONC. In that case you should better not take a sample, but determine the status of all facilities in the area. If that is not possible practically, you should take a random sample. However, then you ideally need to have an idea on what % will have EmONC (or Non-EmONC) status. You can use the following website http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and go to Sample Size, proportion. Furthermore, I would probably only sample the HCs and include the 3 hospitals and 4 private clinics. Kind regards, Annette

Dear Annette,

Thank you for your speedy reply, I want to do a study on the prevalence of DVT among hospitalized patients in six tertiary hospital in Nigeria with a combined bed space of 800

I need your advice:

Do I need to adjust for correction/design effect because i am using six centre

Thanks

Including design effect in the sample size calculation is required for complex sampling designs mainly for RCTs. However, as it is difficult to decide on the size and your sampling strategy is still faily simple, I don't think I would include it. However, make sure that you get a separate random sample for each of the centres. Annette

Dear Annette, I need your help with my study I'm doing a descriptive study on practice among 453 students towards cervical cancer screening. The prevalence is 11% and I want a large sample of at lest 300. Thank you .

Dear Ngozi, I am not sure why you want to sample as you indicate you want to do the study among 453 students. Why not include these all? If you would check at http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm how large a sample you need for a prevalence study in which you expect the prevalence to be 11% this will be 114 (if you have a population of 453). Hope this helps. Kind regards, Annette

Hi Annette,

would like a cross sectional study on the prevalence of kidney failures among patience attending a referral hospital in how do I calculate my sample size

Dear Anne, Visit http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm and go to sample size, proportion. Fill in the required data, population size and prevalence (do you have any prevalence data from other studies available, then use these otherwise 50) and the rest you can leave the same. Then calculate the sample size. Good luck. Kind regards, Annette

hello sir.

i will be doing a study on '' immunohistochemical expression of HUMAN EPIDERMAL GROWTH FACTOR RECEPTOR AND GRADING OF UROTHELIAL CARCINOMA BLADDER'' i have very confusion about it.as it would be what study design,i think it as cross sectional as i would do it on specimens coming into our lab and no follow up of patient will be done,secondly i dnt know how to calculte sample size, there is no such study done and i dnt know the prevalence of bladder carcinomas..somebody told me to calculate it by looking at many articles but i really dnt know..plus i dnt want to have it too much value of sample as the carcinoma is not very common so it would be a problem to complete the study. hope u will have some help for me.thank you

Dear Mehar, It is not clear to me what the aim of your study is. You need to base your sample size on your main outcome. Don't you have some idea of the number of specimens coming into your lab over a certain period of time? and if this seems sufficient to do your study on? Kind regards, Annette

no there is no such record but there was done an audit of another hospital in which they did audit of 2 years in 3 hospitals of the area and got 297 total and 267 of one local hospital,.i m using this formula using z,a and p( prevalence) [so sorry i cant put the formula here] and now the problem is only to get this prevalence figure..

Hello,

I wish to ask for the formula that Epi Info uses to calculate the sample size for a cross-sectional survey.

Dear Akumengwa, You can find that information on the following website http://saber.salud.gob.sv/openepi/Documentation/SSCohortdoc.htm Kind regards, Annette