Disclaimer: The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts.

This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

To cite this case study please use:

Wright, Carrie and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie C. (2020). https://github.com/opencasestudies/ocs-youth-disconnection-case-study. Disparities in Youth Disconnection.

To access the GitHub repository for this case study see here: https://github.com//opencasestudies/ocs-bp-youth-disconnection.

You may also access and download the data using our OCSdata package. To learn more about this package including examples, see this link. Here is how you would install this package:

install.packages("OCSdata")

This case study is part of a series of public health case studies for the Bloomberg American Health Initiative.


The total reading time for this case study is calculated via koRpus and shown below:

Reading Time Method
84 minutes koRpus

Readability Score:

A readability index estimates the reading difficulty level of a particular text. Flesch-Kincaid, FORCAST, and SMOG are three common readability indices that were calculated for this case study via koRpus. These indices provide an estimation of the minimum reading level required to comprehend this case study by grade and age.

Text language: en 
index grade age
Flesch-Kincaid 8 13
FORCAST 10 15
SMOG 11 16

Please help us by filling out our survey.

Motivation


First, let’s discuss the meaning of the term “youth disconnection”.

According to Measure of America (a nonpartisan project of the nonprofit Social Science Research Council that is focused on opportunity in the United States), disconnected youth are:

“young people between the ages of 16 and 24 who are neither working nor in school

The group states that such disconnection hinders these individuals to acquire skills and create relationships necessary to have a successful adulthood.

The group goes on to state that:

“people who experience a period of disconnection as young adults go on to earn less and are less likely to be employed, own a home, or report good health by the time they reach their thirties”

Disconnected youth are also referred to as opportunity youth, which has the added positive connotation that promoting such individuals can be beneficial not only for these individuals, but also for their communities and for society.

Good news: According to this report, the youth disconnection is generally showing decreasing trends for the past 7 years.

Bad news: The same report shows racial and ethnic disparities, where some groups are showing increased rates of disconnection.

In this case study, we will expand beyond the Measure of America annual report to take a deeper look at differences in disconnection between different subgroups of youths. Identifying youths particularly at risk or disconnected, can help inform the design of targeted prevention and re-engagement strategies. To do this, we use the following article as our motivation for this case study.

Mendelson, T., Mmari, K., Blum, R. W., Catalano, R. F. & Brindis, C. D. Opportunity Youth: Insights and Opportunities for a Public Health Approach to Reengage Disconnected Teenagers and Young Adults. Public Health Rep 133, 54S-64S (2018).

The article describes strategies for prevention of disconnection and re-engagement of disconnected youth and how such interventions could greatly positively impact opportunity youth for the entire trajectory of their lives and for future generations. It also points out that indeed there are disparities among different racial/ethnic groups.

Main Questions


Our main questions:

  1. How have youth disconnection rates in American youth changed since 2008?
  2. In particular, how has this changed for different gender and ethnic groups? Are any groups particularly disconnected?

Learning Objectives


In this case study, we will demonstrate how to import and wrangle data available in a Portable Document Format (PDF). We will especially focus on using packages and functions from the tidyverse, such as dplyr, ggplot2. The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R more legible and intuitive.

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Importing text from PDF files using images and the magick package
  2. Apply action verbs in dplyr for data wrangling
  3. How to reshape data by pivoting between “long” and “wide” formats and separating columns into additional columns (tidyr)
  4. How to fill in data based on previous values (tidyr)
  5. How to create data visualizations with ggplot2 that are in a similar style to an existing image
  6. How to add images to plots using cowplot
  7. How to create effective bar plots to for multiple comparisons, including adding gaps between bars in bar plots, adding figure legends to the plot area, and adding comparison lines (ggplot2)

Statistical Learning Objectives:

  1. Implementation of the Mann-Kendall trend test
  2. Interpretation of the Mann-Kendall trend test
  3. Difference between linear regression and Mann-Kendall trend test


We will begin by loading the packages that we will need:

library(here)
library(pdftools)
library(tesseract)
library(magick)
library(knitr)
library(readr)
library(dplyr)
library(stringr)
library(magrittr)
library(tidyr)
library(tibble)
library(ggplot2)
library(directlabels)
library(cowplot)
library(forcats)
library(Kendall)
library(patchwork)
library(DT)
library(OCSdata)

Packages used in this case study:

Package Use in this case study
here to easily load and save data
pdftools to import PDF documents
magick for importing images and extracting text from images
tesseract for extracting text from images with magick
knitr for showing images in reports
readr for saving files
dplyr to filter, subset, join, add rows to, and modify the data
stringr to manipulate strings
magrittr to pipe sequential commands
tidyr to change the shape or format of tibbles to wide and long, to drop rows with NA values, to separate a column into additional columns, and to fill out values based on previous values
tibble to create tibbles
ggplot2 to create plots
directlabels to add labels directly to lines in plots
cowplot to add images to plots
forcats to reorder factor for plot
kendall to implement the Mann-Kendall trend test in R
patchwork to combine plots
DT Interactive tables
OCSdata to access and download OCS data files

The first time we use a function, we will use the :: to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.

Context


So how does youth disconnection happen and what impact does it have?

There are many known risk factors, which have been identified in a variety of contexts (from family, friends, school, community, society) including:

  • poverty (disconnected youth are nearly twice as likely to live in poverty and receive Medicaid)
  • racial/ethnic disparities (findings suggest that these persist even when controlling for income)
  • residential environment (in 2016 while 11.7% was the national average, 24% of people age 16-24 in the rural South were disconnected)
  • poor academic performance
  • poor mental health
  • substance use disorders
  • parental unemployment
  • trauma exposure
  • association with socially deviant peers
  • school policies such as “one strike and you’re out” - which is a zero tolerance school expulsion policy and shown to increase dropouts and incarceration rates

These risk factors make it more likely for young people to miss out on education, training, and networking that can act as a foundation for a successful career.

There are also many known negative consequences associated with youth disconnection including but not limited to:

  • chronic unemployment
  • poverty
  • poor mental health and poor general health (in a 2002 study - youths disconnected for 6 or more months were 3 times more likely to develop depression or other mental health disorder)
  • criminal behavior (in a 2002 study - youths disconnected for 6 or more months were 5 times more likely to have a criminal record)
  • incarceration
  • early mortality