Anthony Li Medicine | Engineering | Data Science

One Paper A Day: Reviewing population health paper "Variation In Health Outcomes: The Role Of Spending On Social Services, Public Health, And Health Care, 2000-09"

TLDR

This is a 2016 population health paper that describes the impact of public health and social services on health outcomes. Link to paper can be found here.

Why should you be interested in this paper?

  • If you are a public health practitioner, you might be interested in finding out how does public health impact healthcare outcomes despite not directly contributing to patient care itself.
  • With limited resources, spending on healthcare has to be balanced between public health, social services and frontline healthcare services.

Key learning points

Methods

  • Retrospective longitudinal study of the fifty states and the District of Columbia for the period 2000-09 (giving is 510 state year observation).
  • Dependent variables: Eight measures of state-level health outcomes (e.g. asthma, mental health, activity limitations). Other health outcomes include state level mortality rates per 100,000 population for acute myocardial infarction, lung cancer, T2DM, post neonatal mortality rates.
  • Independent variables: Spending on social services and public health relative to the spending on healthcare in each state.
  • Ratios of descriptive analysis:
    • Both social services and public health spending are included jointly in the numerator of analysis in view of the fact that public health addresses social and environmental determinants of health for the population, as opposed to medical care delivered to individuals.
    • Ratio of social to health spending was calculated with the denominators: publicly funded health care spending (Medicare and Medicaid) and all public and private healthcare spending in the states.
  • Covariates: Factors related to demographic characteristics, economics and the availbility of health care resources that might confound the association between state funding and health. These factors are adjusted for.
  • Data analysis:
    • Descriptive statistics and graphs to characterise state-level variation in health outcomes, in the ratio of social to health spending, and in the component parts of the ratio.
    • To estiamte associations between the ratio of social of social spending and each of the eight healthcare outcomes, we fitted separate multivariate linear regression models for each health outcome as a function of the ratio of social to health spending in the state, using annual data for 2000-09 with spending variables lagged one and two years.
    • Models were also fitted with candidate covariates, including percentage of population, percentage white, unemployment rate,percentage of children living in single parent households, and number of primary care physicians and hospitals per 101,000 population.

Results

  • State level variation in spending and outcomes
    • Average ratio of social to health spending in 2000-09 was 3.09 and the median was 3.02.
    • Mean share of state GDP devoted to healthcare spending was 14.1 percent while 12.2 percent was devoted to social services spending.
  • Associations between spending and health outcomes
    • Primary and secondary multivariate analysis shows higer ratios of social to health spending were associated with significantly better outcomes for six out of eight healthcare outcomes.

Thoughts

  • To understand the impact of this paper, we need to understand the definition of social services. They were defined as primary, secondary, and higher education; income supports such as cash assistance, general relief for low income or needs tested; beneficiaries of public welfare programs and supplemental nutritional assistance program funding; transportation and spending on sidewalks highways and mass transit systems, environment such as sanitation and programming; public safety, law enforcement and fire protection; housing such as aid for public or private housing and community development.
  • Such social service spending might not be applicable to countries outside of US. For instance, in Singpoare, supplemental nutritional assistance programs are not common. This might place in question the external validity of the paper.
  • It might also be much better for future studies of this nature to be more specific in the public health or social service intervention it is trying to study. It seems like the paper has a very broad conclusion that increases in public health or social spending correlates with improvements in healthcare outcomes.

One Paper A Day: Reviewing clinical paper "Immunogenicity and reactogenicity of BNT162b2 booster in ChAdOx1-S-primed participants (CombiVacS): a multicentre, open-label, randomised, controlled, phase 2 trial"

TLDR

This paper describes the efficacy of preventing COVID-19 disease by mixing the Astra Zeneca vaccine with the Pfizer BioNTech vaccine across the first and booster doses of COVID-19 vaccination. Paper can be found here

Why should you be interested in this paper?

  • To end the current ongoing COVID-19 pandemic, it is paramount that we increase vaccination rates globally. However, vaccination campaigns could be limited by vaccine manufacturing, logistics, safety halts, geopolitics, etc. Therefore, having varied vaccination combinations would be most helpful in eliminating some of these limitations by ensuring flexibility in innoculation.
  • Additionally, emerging literature suggest that mixing of vaccination types across the first dose and the booster dose can potentially improve immunogenecity and, thus, vaccine efficacy.
  • With emerging variants of concern for COVID-19, this approach has potential to provide greater protection coverage since different vaccine types are developed for different variants or parts of COVID-19 (e.g. spike protein vs inactivated virus).

Key learning points

Methods

  • Study design and characteristics: CombiVacS is a phase 2, multicentre, open-label, randomised, controlled trial done at five university hospitals in Spain. Hypothesis was that immunogenicity after Pfizer vaccination would be superior to no vaccination in patients who was primed with one dose of Astra Zeneca vaccination. Patients who had previous COVID infections or had a second dose of vaccination (regardless of manufacturer) were excluded from the study.
  • Randomisation and masking: Participants were randomly assigned (2:1) to receive one intramuscular injection of BNT162b2 or maintain observation.
  • Procedures: Participants were monitored on day 0, 7 and 14 for adverse effects from vaccination. Additionally, blood samples were drawn at these visits for safety and immunology monitoring purposes.
  • Outcomes: Humoral immune resposnse to vaccination was anlysed at D14 after vaccination: (1) Antibodies against SARS-CoV-2 spike protein titers. (2) Secondary immunogencity outcome of neutralising antibodies titers. (3) Inflammatory IFN-gamma cytokien production against SARS-CoV-2 spike protein.

Results

  • Geometric mean titers of antibodies specific to SARS-CoV-2 receptor binding domain in the intervention group (7756.68) compared to the control group (99.84), 14 days after 2nd dose of vaccination. Immunogenic response in the intervention group (4353.51) compared to the control group (90.05), 7 days after 2nd dose of vaccination.
  • Geometric mean titers of antibodies specific to SARS-CoV-2 spike protein in the intervention group (3684.87) compared to the control group (101.2) 14 days after 2nd dose of vaccination. Response in the intervention group (2246.25) compared to control group (102.5) 7 days after the vaccination.
  • Similar to other studies known, this study also found better immune responses to vaccination with longer intervals between first and second dose of vaccinations. The same can be said for the side effect profile as well.
  • Adverse reactions to vaccination were more common amongst women as compared to men. Interestingly, women also has a stronger immune response to vaccines compared to men. Thromboembolic events are more common with the Astra Zaneca vaccine. Anaphylaxis events are more common with the Pfizer vaccine.

Thoughts

  • Promising uncomplete study with more upcoming findings in the future on the possibility of hterologous vaccination schedules.
  • Limitation of control arm not having homologous vaccinations. This was because the Astra Zeneca vaccination was suspended for use in the country of study, Spain.
  • Another limitation is the relatively smaller sample size. However, as the study is ongoing, future research might reveal more convincing results.

One Paper A Day: Reviewing public health paper "Assessing the Association Between Social Gatherings and COVID-19 Risk Using Birthdays"

TLDR

This paper describes the association between social gatherings (e.g. Birthdays) and COVID-19 infections. Paper can be found here

Why should you be interested in this paper?

  • Studying the impact of various types of social contact and its relation to COVID-19 infections was considered a challenging task by the authors for 2 reasons:
    • Large dataset required
    • Problems of confounding
  • This paper overcomes these problems by using patients’ birth dates as a surrogate marker of a social contact

Key Learning points

Data sources

  • Household birthdays and household COVID-19 infections between 1st January 2020 and 8th November 2020 from a large commercial private insurance database was used.
  • The data contains linkages for all household members enrolled on a single insurance plan.

Methods

  • Statistical analysis:
    • Hypothesis made that COVID infections would happen within a 2 week period after birthday. Falsification analysis made by studying the correlation of infections 4-8 weeks prior to birthday.
    • Assumption made that the likelihood of COVID-19 infection is associated with:
      • social distancing behaviour of household members
      • COVID-19 prevelance among social groups in which any type of social gathering occurs
      • Adjustments to the correlation between birthdays and COVID-19 infections made according to the stage of pandemic. Prevelance of infections varies according to stage of pandemic e.g. early stages have lower prevelance. Later stages have higher prevelance.
    • Multivariable linear model of COVID-19 diagnosis in a household in a given week as a function of birthday occuring in the household 2 weeks prior, county and week fixed effects, family level covariates.
  • Subgroup analysis:
    • Adult birthday vs child birthday
    • Milestone birthday events, e.g. 40yo vs 50yo vs 60yo birthday celebrations
    • Voters attending mass rallies
    • Counties with different social distancing guidelines
  • Sensitivity analysis:
    • Analysis conducted to address concerns that birthdays might disproportionately happen on later stages of the pandemic where prevelance is high. This would lead to a spurious conclusion where infections were driven by other risk factors rather than a social gathering to celebrate a birthday. This was effectively mitigated by performing analysis by calender quarter and demographics (e.g. US census region, household size, share of households with any children, industry of household’s primary insurance subscriber, etc.).
    • Possibility of infections 4-8 weeks before household birthday analysed. Finding correlation between infections during this period with birthdays within the household would suggest that main hypothesis is not correct.

Results

  • 36.4 per 10,000 individuals had birthdays 2 weeks prior to their COVID-19 diagnosis. Within the fifth decile of population by COVID infections, 28.7 per 10,000 individuals had COVID-19 diagnosis. At baseline, 27.8 per 10,000 individuals had COVID-19 diagnosis.
  • Essentially, individuals had a 31% relative increase in likelihood of household COVID-19 infections when an individual within the household had a birthday 2 weeks prior.
  • Other factors did not affect this correlation found: political inclincations, precipitation in the county, county level shelter in place policies, etc.
  • Other noteworthy point is that children birthdays present a higher risk of COVID-19 infection, as compared with adult birthdays.

Thoughts

  • Results are unsurprising given the nature of birthdays. Attendees are likely not wearing masks during consumption of meals, singing birthday songs, etc. In addition, the contact time is likely to be prolonged, minimally an hour of celebration.
  • The idea of studying the impact of social gatherings through birthdays from an insurance database is novel and refreshing.
  • However, the limitations are clear. Social gatherings take many forms e.g. marriage anniversaries, baby showers, funerals, weddings, etc. It is unlikely that a birthday celebration is a good approximation of the kind of contact and infective transmissibility these other social gatherings entail.
  • Finally, a Singapore study in Lancet published here gave a more detailed breakdown of the risk associated with each kind of social gathering. This was accompanied with COVD-19 seroprevelance data in the studied population, which provides a more robust epidemiological understanding of COVID-19 transmissions/transmissibility.

One Paper A Day: Reviewing EID NLP paper "Evaluation and Verification of the Global Rapid Identification of Threats System for Infectious Diseases in Textual Data Sources"

TLDR

This paper proposes a new NLP based EID system known as The Global Rapid Identification of Threats System (GRITS). Paper can be found here.

Why should you be interested in this paper?

GRITS is one of the few NLP based EID systems known in the literature. It was developed by EcoHealth Alliance to identyify EID in digital textual sources of data. Other systems include DANIEL, BIOCASTER and PULS.

Key learning points

Limitations of traditional biosurveillance systems

  • Limited geographic coverage
  • Poor quality of underlying healthcare infrastructure in certain parts of the world
  • Lack of laboratory capacity
  • Reticent governments in annoucement of EID in view of possible economic and reputation harm
  • Furthermore, traditional systems are designed to detect known infectious disease threats (e.g. Influenza, Ebola, etc.)
  • Other complementary systems to traditional systems: Syndromic disease surveillance, digital biosurveillance

Method

  • Data: 150,000 articles collected over 2-3 year periods. Analysts manually assigns labels for diseases. Subset of 12,000 articles used for training. 3,500 articles used for testing.
  • Search function: Indexed with Elasticsearch. Elasticsearch also assigns relevance scores to individual terms using TDIF. Additionally, extracted feature metadata for each articles are used to sort search results.
  • Feature extraction: Disease related and contextual features are extracted from texts. Features are stored as annotations on the text. They are also further categorised as disease, pathogens, symptoms, hosts and modes of transmission.
    • Python’s standard pattern-mathcing libraries and NLTK package used to match keywords from a variety of compiled ontologies of terms related to infectious disease and public health
      • Biocaster ontology
      • GRITS ontology
      • HealthMap disease labels
      • The disease ontology
      • USGS topographic feature vocab
      • Wordnet
    • Dates are extracted from Stanford SUTime Java library
    • Case counts identified using the CLiPS Pattern library’s search module
  • Classsifier training, verfication and evaluation
    • Inputs is a vector of features extracted by GRITS’ NLP algorithms.
    • SKLearn’s logistic regression used, in conjunction with multiclass one vs rest classifier, to train model for prediction of the disease referred to by the text.

Results

  • Precision 64%, recall 63% and F1 score 0.317. Presumably, this is the average precision, recall and F1 scores of model predictions of all diseases tested by the researchers. Detailed performance of individual disease predictions were published in the original article.

Thoughts

  • No comparison was made with other classifiers. Mostly a descriptive paper about a new EID surveillance system powered by NLP.
  • Unclear description of how data was obtained. Presumably the data was extracted from Healthmap.
  • Otherwise, it is a succinct explanation of the essential methods to develop a similar system.
  • I also agree that the system is useful for public health specialists and epidemiologists. It can complement existing surveillance systems to improve accuracy and promptness of reporting new EID and instituting relevant public health actions.

One Paper A Day: Reviewing dataset paper "A Dataset for Multilingual Epidemiological Event Extraction"

Preface

This is the first article in the series of One Paper A Day. In this series, I will commit to reviewing and summarising one article every day. Papers reviewed likely to fall under the category of machine learning, software engineering or medicine.

TLDR

This paper essentially provides a corpus for development of NLP tools and techniques for processing text related to Emerging Infectious Diseases (EID). Paper can be found here.

Why should you be interested in this paper?

  • Suitable for people developing NLP tools for surveillance of EID.
  • There are few other EID corpuses available on the web.
  • As paper was published in 2020, it has a fairly recent publication (relative to this blogpost) with a comprehensive literature review of both NLP techniques and corpus.

Key learning points

Literature review

  • Event extraction methods applied on EID news sources can be divided into three types:
    • Pattern based: Rules and template to extract events from text through representation and explotation of expert knowledge
    • Data driven: Statistical techniques to discover the relations in text
    • Hybrid of pattern based and data driven approaches
  • Review of existing event extraction methods published:
  • NLP applied on internet search data to look for specific EID trends e.g. Influenza
    • Sources of internet search data: Google, Twitter, Yahoo

Methods

  • Corpus creation:
    • Data extracted from online Promed articles from 1 Aug 2013 to 31 Aug 2018
    • Data cleaning done to remove boilerplate content in each of the articles
    • Data filtering to extract languages of interest
    • K means clustering applied on articles
    • Deduplication of data was done to remove potential duplicates. Tool used is ONION.
    • Control dataset created with Huffpost news articles from 2012 to 2018
  • Corpus statistics:
    • Training set comprises of 10,000 English articles and 2,996 French articles
    • Human annotated datasets, consisting of 444 English articles and 2,722 French articles, provided the ground truth to evaluate models developed.
  • DANIEL for extraction:
    • Segmentation
    • Event detection
    • Event localisation
    • Output is disease location pair
  • Text classification model:
    • Supervised classification methods evaluated: naive bayes, random forest, neural network
  • Performance metrics considered:
    • Recall and precision conisdered for evaluation of DANIEL
    • Recall, precision, F measure metrics for evaluation of text classification models

Results

  • DANIEL:
    • F score of 75%, precision of 60% for English articles
    • Precision of 74%, recall of 83% for French articles
  • Text classification model (English)
    • Highest precision of 80% by Random Forest
    • Highest recall of 76% by Neural Network
    • Highest F measure of 74% by Random Forest
  • Text classification model (English)
    • Highest precision of 80% by Random Forest
    • Highest recall of 74% by Naive Bayes
    • Highest F measure of 67% by Random Forest

Key contributions

  • Large multilingual EID dataset available here
  • Baseline performance of NLP techniques for event extraction and text classification methods