Quantitative research can be used to test a theory, and is therefore often deductive in nature. It is a structured and objective approach to research, with the general aim of producing inferences on a population level. With quantitative research, knowledge is produced through a process of quantifying concepts, followed by the verification and falsification of hypotheses by conducting statistical analyses.

1. Quantitative approaches/designs

A quantitative study design is a plan or protocol for conducting a scientific investigation, which enables us to (1) translate a conceptual hypothesis into an operational hypothesis, and (2) statistically test this formulated hypothesis. We can distinguish the following quantitative research designs:

Overview of different quantitative research designs
Overview of different quantitative research designs


1.1 Randomized controlled trial (RCT)

The randomized controlled trial is strongest form of experimental design. A design can be described as experimental if (1) there is randomisation, (2) predictions can be made about cause and effect, and (3) a comparison is made between two or more intervention groups. The following is a graphic representation of a RCT design.

Randomized controlled trial
Randomized controlled trial


1.2 Quasi-experimental design

This design is often used when random assignment is not possible (e.g. for ethical reasons). Other than that, it is similar to an RCT, with 2 or more intervention groups being compared.

Quasi-experimental design
Quasi-experimental design


1.3 Ecological Study

An ecological study examines the correlation between (frequency of) exposure and disease outcomes on a population level. As more and more relevant data is available through health databases, ecological studies are often easy to perform with low costs involved. Disadvantages of ecological studies include the difficulty of determining cause-effect relationships, regional differences in diagnostics and exposure measurement, a lack of information on confounding variables, and the phenomenon of ecological fallacy.

An example of an ecological approach could be a study where the correlation between total dietary fat intake and the incidence of breast cancer deaths is being examined through a cross-country comparison, as depicted in the figure below.

Example of an ecological study
Example of an ecological study


1.4 Cohort study

A cohort study is an observational and longitudinal research design that is used to examine how groups evolve over time. By following these various groups, we can keep track of which individuals develop a certain disease over time, and determine if certain group characteristics correlate with the likelihood of contracting or developing this disease. A cohort is a group of individuals who share a common characteristic or experience within a defined period (e.g. born in a certain year, exposed to a certain drug, student in a certain class). We then try to compare different cohort groups based on the assumption that the different groups are in many ways similar, except for the characteristic the cohort division was based on. The following figure provides an example.

Example of a cohort study
Example of a cohort study

The goal is to find out if the cohort characteristic (in this case alcohol consumption) potentially contributes to the incidence of a disease (in this case liver cancer). It is important to note that in cohort studies, all participants have to be ‘healthy’ at the beginning of the study, i.e. people that already had liver cancer in 2008 would be excluded to participate in the example above. When conducting a cohort study, we have to be very careful of potential confounding variables, and control for potential confounding variables within our statistical analysis if necessary. For instance, if the cohort of alcohol consumers has a greater proportion of men than the cohort of non-alcohol consumers, a higher prevalence of liver cancer in the alcohol-consuming group could also be (partly) attributable to the fact that men are in general more susceptible to developing this type of cancer in comparison to women. In that case gender is a confounding variable that needs to be controlled for in your analysis. One major issue however is the fact that we do not always know which potential confounding variables exist, which makes it hard to control for them.

Cohort studies can be conducted both retrospectively and prospectively, with the prospective approach being the more robust approach as you have more ways to ensure validity and reduce error. Prospective cohort studies are seen as the most reliable way of conducting observational epidemiological studies. The results from long-term prospective cohort studies are considered to be of superior quality to those obtained from retrospective cohort studies or cross-sectional studies.

1.5 Meta-analysis

A meta-analysis is a statistical analysis of the outcomes of multiple other studies (e.g. effect size, power). By conducting a meta-analysis, we gain a clearer picture of the overall effect that is found for the collective of studies, and it generally means that the outcome is less prone to bias and error.

1.6 Case-control study

Case-control studies are retrospective observational studies that are often used to identify factors that may contribute to a medical condition by comparing subjects who have that medical condition (i.e. the cases) with subjects who do not have the that medical condition but are otherwise similar (i.e. the controls). We can then compare the prevalence of exposure to a hypothesized risk factor among cases and controls, and determine if people with the medical condition have indeed been exposed more frequently to the supposed causal attribute. Usually the number of control subjects included in the study is just slightly larger than the number of cases.

1.7 Cross-sectional study

A cross-sectional study is an observational study that involves the analysis of data collected from a population, or a representative subset, at one specific point in time. A cross-sectional study measures both the exposure and the outcome of interest at the same point in time. Cross-sectional studies differ from case-control studies due to the fact they aim to include very large samples that are representative of the whole population. Case-control studies on the other hand generally include only a small number of individuals with a specific disease outcome, and subsequently compare these individuals with a small healthy control group who are otherwise similar. The fact that the control group in case-control studies is specifically selected based on its similarities to the diseased group, leads to the control group often being highly specific and not representative of the entire population.

The following figure illustrates a cross-sectional research design.

Cross-sectional study design
Cross-sectional study design

Cross-sectional studies are useful for establishing the odds ratio and relative risk for developing a disease based on someone’s exposure status to a risk factor. The data can be gathered specifically for the intended study, but more frequently cross-sectional studies depend on data that was originally collected for another purpose.

1.8 Summary of quantitative research designs

In conclusion, the following figure provides an hierarchical overview of the various aforementioned research designs, displaying the ‘strength of evidence’ they provide for potential causality.

Hierarchy of research designs
Hierarchy of research designs


2. Methods of data collection

  • Observations
  • Scales
  • Physiological measurement / biomarkers
  • Interviews
  • Questionnaires
  • Surveys


3. Sampling in quantitative research

In quantitative research, the aim generally is to obtain insights that can be applied on a population level. Probability sampling (or random sampling) is the golden standard when wanting to obtain a representative sample and to obtain findings that can reasonably be generalized to the population. For this purpose, it is important to obtain a good sampling frame for accurate sampling. A sampling frame is a list with all (or as many as possible) cases/participants in a population. Any mismatch between the sampling frame and the conceptually defined population can create errors. The sampling ratio is the number of cases in the sample divided by the number of cases in the population or sampling frame (i.e. it is the proportion of the population in a sample).

We discriminate between the following 4 main forms of probability sampling:

Sampling methodElaboration
Simple RandomCreate a sampling frame for all cases and then select cases using a completely random procedure (e.g. random number generator).
SystematicA random sample in which the researcher selects cases with predetermined intervals (e.g. every fifth or tenth case) from a sampling frame. Using this fixed sampling interval one creates a quasi-random sampling method.
ClusterThe first step is to divide a big population in multiple clusters, and randomly select a certain number of these clusters (e.g. the Netherlands can be clustered in 393 municipalities, from which we randomly select 40). Within each selected cluster, we create new and smaller clusters, and randomly select a proportion of those (e.g. the selected 40 municipalities can all be divided into 15 neighbourhoods, and in each municipality we randomly select 4 neighbourhoods for our sample). Cluster sampling can have multiple steps to continuously ‘zoom in’ on a population and select a geographically scattered sample. Therefore, this sampling method is often used to cover wide geographic areas.
StratifiedA random sample in which the researcher first identifies a set of mutually exclusive and exhaustive categories, divides the sampling frame by the categories, and then uses random selection to select cases from each category. This way, the researcher makes sure the sample is more representative of the population, at least with regard to the predetermined categories and characteristics (e.g. if a certain population consists of 60 % women and 40 % men, stratified sampling allows us to draw random samples amongst men and women so that the sample contains the 60/40 gender ratio).

Due to pragmatic or ethical reasons we cannot always use a form of probability sampling, and therefore we can also consider nonprobability sampling for some quantitative studies. More information on nonprobability sampling methods can be found here.

4. Analysis of quantitative data

Before we can conduct a statistical analysis, we first have to determine which independent variables and dependent variables we want to use and what kind of data these variables are comprised of. We will now discuss several types of variables.

Categorical variable: A qualitative (non-numerical) variable that has two or more categories. We can distinguish between three different subtypes of categorical variables:

  • (1). Nominal variable: A variable with two or more categories, but with no hierarchical difference between these categories (e.g. nationality, political party preference, occupation, gender).
  • (2). Dichotomous variable: A variable with exactly two categories (e.g. male/female, yes/no, left handed/right handed, heads/tails).
  • (3). Ordinal variable: A variable with two or more categories, but with a hierarchical order between these categories. An example can be ‘income level’ when differentiating between a low, middle and high income group. Likert scales are often at the ordinal level of measurement, meaning that the answers on a Likert scale indicate a ranking between different answer categories.

Continuous variables:
These are quantitative variables. We can distinguish between two different subtypes of continuous variables:

  • (1). Interval variable: This variable has a numerical value which can be measured along a continuum (e.g. the temperature in degrees Celsius, or someone’s IQ). There is no absolute zero, but we can measure distance between categories.
  • (2). Ratio variable: Similar to interval variable, with the additional condition that a measurement of 0 implies that there is none of that variable; there is an absolute zero. It also implies you cannot have negative numbers. Examples are length, mass, distance, salary.

The following table provides an overview of which statistical analysis to use based on the type of variables we are working with and the research question we aim to answer.

Overview statistical analyses
Overview statistical analyses


5. Strengths of quantitative research designs

The following characteristics can generally be perceived as strengths of quantitative research designs:

  • Not clouded by subjectivity
  • Can establish causality
  • High reliability
  • Relatively easy to reproduce a study
  • Able to analyse large data sets and include a lot of participants
  • Possibility to make substantiated claims about hypothesized relations based on statistical significance levels (p-values)


6.Weaknesses of quantitative research designs

The following characteristics can generally be perceived as weaknesses of quantitative research designs:

  • Focuses only on what currently exists, ignores possibilities of what could be
  • “Hard” science approach; loss of human element
  • Some important concepts are hard to measure: e.g., intelligence, social class, class struggle
  • Not much flexibility in the process of collecting, analysing and interpreting data
  • Questionable validity


7.Validity/Reliability in quantitative research

While reliability always refers to a measurement within a study, the term validity is usually more confusing due to its frequent usage in multiple contexts. We can distinguish between measurement validity, internal validity and external validity.

7.1 Measurement reliability

Reliability refers to the attribute of consistency in measurement. It means that if a study would be reproduced with a similar group of respondents in a similar context, the results would be similar. The reliability of a study is a delicate continuum rather than an all-or-none matter. We can distinguish between the following types of reliability:

  • (1). Test-retest reliability: This refers to consistency of a measurement across time. It means that a measurement delivers the same outcome even when we measure at different moments.
  • (2). Internal consistency: Internal consistency means that the different items within a single test/questionnaire tend to show a consistent interrelatedness. The reliability of an instrument is determined by assessing how accurately the items that supposedly reflect the same construct yield similar results. It means that the results for different items for the same construct within a single measure should be consistent. There are multiple ways for assessing the consistency of items within a single test:
    • Split-half reliability: The split-half reliability can be determined by correlating the pairs of scores obtained from equivalent halves of a test administered only once to a representative sample. If the scores obtained on one half of the test correlate strongly with the scores obtained on the other half of the test, the split-half reliability is high. It can however be challenging to divide a single test in two nearly equivalent halves.
    • Inter-item correlation: Whereas in the split-half method we divide the test items in two equal groups, with inter-item correlation we assess how the individual test items correlate amongst themselves. If multiple test items really are supposed to measure the same construct, then the correlation between these test items should be high.
  • (3). Inter-observer reliability: This type of reliability entails that different observers give consistent estimates of the same phenomenon they are assessing. It means that the scores assigned by different observers show a high correlation.

Reliability can be improved in 4 ways:
(1). Clearly conceptualize and define the constructs that are being examined. When wanting to measure if and to what degree a patient is depressed, we first need a clear description of what we mean by ‘depression’.
(2). Increase the level of measurement. For example, instead of measuring blood pressure as either high or low, we could also use a 7-point Likert scale with a more detailed division.
(3). Use multiple indicators of the same construct.
(4). Pilot testing and making revisions to the way of measuring accordingly.

7.2 Measurement validity

Measurement validity indicates how well a construct is operationalized in a measurement. It gives us an idea on how well a measurement actually represents the phenomenon we are investigating. For instance, if we want to measure the construct of depression, then the measurement tool we use should accurately reflect the concept and definition of depression that we are adhering to. Roughly speaking, we can discriminate between 4 main types of measurement validity:

  • (1). Face validity: Face validity simply means that it seems plausible that the indicator we are using really measures the intended construct. It addresses the question: On first glance, to what extent does a test seemingly cover the concept it is supposed to measure? Face validity is largely a subjective judgement call.
  • (2). Content validity: Content validity is concerned with capturing the full content of a construct in a measurement. It gives an indication to what extent the construct we want to investigate is being covered by our measuring instrument. Content relevance entails that all test items of a measurement/test are relevant for the construct we want to assess, while content coverage entails that all the sub-domains that a construct is comprised of are covered adequately with the various test items.
  • (3). Criterion validity: The extent to which the measurement corresponds to a reliable external source that measures the same construct. There are two subtypes of criterion validity:
    • Concurrent validity: Concurrent validity means that a new measurement correlates highly with a pre-existing indicator that measures the same construct. Ideally, we identify the gold standard measuring tool for comparison to assess the concurrent validity of the tool we are currently using.
    • Predictive validity: The degree to which an indicator accurately predicts future events. For instance, a preventive screening instrument for breast cancer has a high predictive validity if it can accurately predict who will develop breast cancer in the future.
  • (4). Construct validity: Construct validity reflects the degree to which a test measures what it purports to measure. There are two subtypes of construct validity:
    • Convergent validity: The degree to which multiple indicators that supposedly measure the same construct act alike.
    • Discriminant validity: The degree to which indicators of a construct are negatively associated with indicators of opposing constructs. It furthermore entails that measures for unrelated constructs should show no correlation.

7.3 Internal validity

Internal validity means that the independent variable, and nothing else, influences the dependent variable. It entails a proper demonstration of causal inferences within a study. In other words, internal validity reflects the extent to which a cause-effect conclusion based on a study is justified and robust. Therefore, internal validity is only relevant for studies in which we aim to establish a causal relationship. For most observational studies the concept of internal validity is irrelevant. There are many potential threats to internal validity:

  • (1). History effect: An external event that unintentionally influences the dependent variable in the midst of a longitudinal study.
  • (2). Selection bias: Especially in a research design without random assignment, there is a risk that the groups unintentionally differ on characteristics that could influence the dependent variable. Pre-testing can give us an indication if the groups are similar with regard to the dependent variable prior to a possible intervention.
  • (3). Maturation: As an experiment or a study progresses, there are various biological, psychological or emotional processes that can unintendedly affect the dependent variable. For instance, during an experiment of several hours it is likely that participants become tired and lose focus. Pre-tests and usage of control groups can be helpful to control for maturation effects
  • (4). Mortality / Attrition: Mortality refers to participants dropping out of a longitudinal study halfway. It becomes very troublesome to produce reliable causal inferences based on the outcomes of study if a substantial number of participants dropped out in the midst of the study.
  • (5). Testing effects: When a participant is given a pre-test prior to an intervention, it is possible that the pre-test itself triggers a learning effect, affecting the outcome on the dependent variable in the post-test. If testing effects occur, it becomes difficult to ascribe any changes in the dependent variable (completely) to the intervention.
  • (6). Instrument change: This means that the instrument used to measure the dependent variable changes during the experiment. This can apply to materialistic measuring instruments (such as a weighing scale or blood pressure monitor), but it can also apply to observers (such as a researcher becoming more or less strict with his grading criteria).
  • (7). Compensatory behaviour: As the study advances, participants in the control group might observe beneficial changes in the intervention group with regard to the dependent variable. This might spark a drive and desire amongst participants in the control group to also improve their outcome on the dependent variable, for instance by increasing their efforts. Alternatively, the differences in outcome might lead to demoralisation amongst participants in the control group, which can negatively impact the dependent variable.
  • (8). Statistical regression towards the mean: When participants in a study are selected based on the fact their scores on the dependent variable are at the extremities of a certain continuum prior to the study, the intervention of the study might not have the intended effects. For instance, if you select a group of participants based on their extremely high blood pressure for testing a new medicine to reduce hypertension, chances are that their blood pressure by chance will be lower at the post-test. As blood pressure can fluctuate naturally and the participants were selected while (and because) their blood pressure was peaking, it is likely the post-test outcomes will be favourable, while this may not be due to the new medicine itself. In other words, if participants are selected based on extreme outliers in their initial test scores, it is likely their scores will regress towards the mean in subsequent tests, regardless of the intervention.
  • (9). Diffusion of treatment / Contamination: This happens when participants of different intervention groups communicate and learn about each other’s treatments. It is possible that this communication unintentionally affects the outcomes for the different groups. For instance, if the intervention in one group is a relaxation exercise to reduce anxiety levels, and some people from the control group become aware of this intervention and its effect, they might utilize this knowledge and use the same technique to reduce their own anxiety levels. This hampers comparative evaluation between the control group and the intervention group.
  • (10). Experimenter expectations: A researcher can unintentionally affect an experiment’s outcomes by unconsciously behaving in different ways towards the control group and the intervention group. The researcher can inadvertently steer the outcomes of a study in a certain direction this way.
  • (11). Placebo effect: The mere fact to be participating in a study or to be exposed or subjected to an intervention of which a participant thinks it will have an effect, can influence the outcomes on the dependent variable in such a way that an effect is indeed noticeable, regardless of the intervention being effective or not.

7.4 External validity

External validity refers to the degree to which the results of a study can be generalized to the wider population and to a broader context. External validity mainly applies to experimental studies. We can distinguish between the following forms of external validity:

  • (1). Population generalization: Population generalization is concerned with the question: to what extent can we generalize a study’s outcomes to a whole population? For instance, if we conducted a study with patients with diabetes aged 60+ in the city of Amsterdam, to whom can we generalize any relevant findings? Do the findings apply to diabetic patients from other age groups and from other cities for instance? These are all questions regarding the population generalizability of a study.
  • (2). Naturalistic generalization: The extent to which we can generalize findings from an artificial and highly controlled experimental setting to the real world. There are two phenomena important within this context:
    • Reactivity: In an experimental setting, people might respond differently than they would in the ‘real world’ due to their awareness of the experimental setting. 
    • Mundane realism: Mundane realism is concerned with the extent to which conditions in an experimental setting are similar to conditions in the real world.
  • (3). Theoretical generalization: Theoretical generalization addresses the question if we can accurately generalize findings from a specific experiment to more abstract theories and relations.