Digging Into The CDC’s Data On Preterm Birth

PhD student Andrea Sevilla-Clark reviewed 50 years’ worth of pregnancy data released by the Centers for Disease Control and developed an interactive web application called CDC NatView to make it easy for others to explore the large dataset.

The paper is the first study that investigates risk factors associated with preterm birth (PTB) in the United States using CDC Natality data from 1968 to 2021. The study reveals a concerning upward trend in late preterm births (PTB). It highlights significant racial disparities, particularly between African American and White populations, in PTB rates, education, body mass index, and access to prenatal care. 

Preterm birth, defined as the delivery of a baby before 37 weeks of gestation, is a significant health issue that affects millions of families worldwide each year. The early arrival of a newborn not only presents immediate health challenges but also has long-term implications for both the baby and the family. Preterm infants are at an increased risk for a range of complications, from respiratory distress and infections to developmental delays and chronic health conditions. Understanding the causes, risks, and preventive measures associated with preterm birth is crucial for expectant parents, healthcare providers, and society as a whole.

The study highlighted key findings, including the rise of late PTBs, the influence of maternal age and interpregnancy intervals on PTB risk, and the persistent disparities between African American and White populations.

CDC NatView Data
CDC NatView Data

Sevilla-Clark and the team also developed CDC NatView, an open-source RShiny web application that allows easy exploration and visualization of the CDC natality data, enabling further research and understanding of PTB risk factors and maternal morbidities. The web application enables users to explore birth records by showing how PTB rates and risk factors have changed over time. It also shows associations and relationships between maternal characteristics like race, age, BMI, and PTB outcomes, as well as how multiple risk factors might work together to influence PTB risk. 

The findings underscore the importance of PTB prevention, particularly among high-risk groups. Key interventions include reducing health disparities that address social and economic factors, ensuring women have access to early, regular, high-quality prenatal care, and educating women about risk factors like interpregnancy intervals and body mass index.

We sat down with Sevilla-Clark to find out more about the paper and why she thinks it is important to do research on women’s health.

Q: What made you want to research women’s health?

Women’s health is as important for women themselves as it is for the entire society. In particular, women’s pregnancy health is an important part of women’s holistic health and wellness. 

However, there are persisting bottleneck issues that hinder healthy pregnancies, and these include:
(1) Adverse pregnancy outcomes that include premature birth, preeclampsia, and gestational diabetes contribute to maternal and fetal mortality and morbidity;

(2) A persisting disparity in pregnant women’s health needs to be addressed to ensure adequate healthcare for pregnant women across different groups in society. 

Using large amounts of data and machine learning can be game-changing in tackling these issues. 

Healthy pregnancies mean healthy women and children and, thus, a healthy and thriving community and society.

 

Q: What did you discover from the CDC Natality dataset?

We made a few key findings. Firstly, we found that the increase in preterm rate has largely comprised the late preterm category, that is, a birth between 34 and 36 weeks of gestation. 

We also confirmed the racial disparities that have been reported in the literature, namely between the African American and White populations. This appears to be driven by social-economic and lifestyle factors, for example highest educational attainment and pre-pregnancy BMI. The African American population exhibited a statistically significant higher proportion of high pre-pregnancy BMI (overweight and obese BMI brackets) and lower levels of educational attainment (e.g. some college or less), as compared to the White population. 

Maternal age has also been steadily increasing over time, which is consistent with higher educational attainment in women over the years. We also confirmed that shorter intervals between pregnancies are linked to higher preterm birth rates.

This study demonstrates how the CDC dataset can be used to conduct large-scale longitudinal analyses of preterm birth trends and risk factors in the U.S. The development of the CDC NatView application also provides a valuable open-source tool for other researchers to easily explore this data and generate insights to enhance our understanding of preterm birth.

preterm birth according to race
PTB incidence by race (1995-2021). The figure shows the count normalized rate of PTB incidence by race, with the African American population experiencing the highest incidence by a large margin.

 

Q: What does the CDC NatView app do?

We developed the CDC NatView to make it easy for others to explore this large dataset. While this study and web app are geared more towards researchers and health professionals, the insights gained could eventually lead to better prenatal care practices and interventions to reduce preterm birth rates. This would benefit expecting mothers and families by decreasing the chances of complications and lifelong health issues associated with preterm birth.

This is exciting for researchers, clinicians, and public health professionals interested in maternal and child health. The study uncovered concerning trends in preterm births and created a valuable tool to help further understand and potentially prevent preterm births, which can lead to infant mortality and health issues.

The CDC NatView tool can be used by anyone interested in exploring trends and risk factors related to preterm birth. They could use the web application to easily interact with and visualize nearly 60 years worth of CDC pregnancy data.

They can explore how various risk factors, maternal demographics, and other aspects like prenatal care are associated with preterm birth outcomes. The insights generated could potentially inform clinical practices, public health policies, and interventions designed to reduce preterm birth rates and related racial disparities. For example, emphasizing the importance of adequate prenatal care and pregnancy spacing to patients.

 

Q: What’s your next step in this research? 

Future work will focus on expanding the CDC NatView tool to include more maternal health factors to analyze, enable more complex queries to understand factor interactions, and automatically pull the latest CDC data as it becomes available each year.

Our P R A I S E lab is also focusing on the bias and fairness issue from the causality lens, as this goes beyond analyzing the data at the observational level, i.e. finding correlations with specific subsets of features, and aims to understand the data generating process and how this contributes to our understanding of bias in the target outcome. For example, how is the occurrence of preterm birth driven by race at a fundamental level? Simply looking at the proportions of preterm outcomes conditioned by race does not give us the full story.