Data and Research Question

Youth Risk Behavior Survey 2023

The Youth Risk Behavior Survey (YRBS) is a national survey that monitors health-related behaviors among high school students, including weapon carrying and associated risk factors.

Dataset Overview

  • Source: Centers for Disease Control and Prevention (CDC)
  • Year: 2023
  • Target Population: High school students
  • Sample Size: Approximately 19,000 students nationwide

Research Question

How do logistic regression, lasso, k-nearest neighbors, and tree-based models compare in predicting school-based weapon carrying among adolescents based on risk and protective factors?

Key Variables for the Research Question

The dataset includes information on various health-related behaviors:

  1. Outcome
    • Weapon Carrying (Carried a weapon on school property)
  2. Predictors
    • Traumatic experiences
    • School Safety Perceptions
    • Bullying Experiences
    • Family Support
    • Social Media Use
    • Peer Relationships

Data Preprocessing

library(tidyverse)
library(tidymodels)
library(dissertationData)
library(here)
data(clean_yrbs_2023)
# Add your data preprocessing code here

Exploratory Data Analysis

We will do it in class…

Creation of the Dataset

# This is an example of how to create a dataset for a model.
# You can use this as a template to create your own dataset.


analysis_data <- clean_yrbs_2023 %>%
    select(
        WeaponCarryingSchool, AttackedInNeighborhood, Bullying,
        SexualAbuseByOlderPerson, ParentalPhysicalAbuse, ParentSubstanceUse,
        ParentIncarceration, SchoolConnectedness, ParentalMonitoring,
        UnfairDisciplineAtSchool, Homelessness
    ) |>
    filter(!is.na(WeaponCarryingSchool)) %>%
    mutate(across(
        c(
            ParentSubstanceUse, ParentIncarceration, SchoolConnectedness,
            ParentalMonitoring, UnfairDisciplineAtSchool
        ),
        ~ as.numeric(.x) - 1
    )) %>%
    mutate(across(
        c(
            ParentSubstanceUse, ParentIncarceration, SchoolConnectedness,
            ParentalMonitoring, UnfairDisciplineAtSchool
        ),
        ~ factor(.x)
    ))

Splitting the Dataset

Cross-Validation

analysis_folds <- vfold_cv(analysis_train,
    v = 5
)
analysis_folds