library(tidyverse)
library(tidymodels)
library(dissertationData)
library(here)
data(clean_yrbs_2023)
# Add your data preprocessing code here
Data and Research Question
Youth Risk Behavior Survey 2023
The Youth Risk Behavior Survey (YRBS) is a national survey that monitors health-related behaviors among high school students, including weapon carrying and associated risk factors.
Dataset Overview
- Source: Centers for Disease Control and Prevention (CDC)
- Year: 2023
- Target Population: High school students
- Sample Size: Approximately 19,000 students nationwide
Research Question
How do logistic regression, lasso, k-nearest neighbors, and tree-based models compare in predicting school-based weapon carrying among adolescents based on risk and protective factors?
Key Variables for the Research Question
The dataset includes information on various health-related behaviors:
- Outcome
- Weapon Carrying (Carried a weapon on school property)
- Predictors
- Traumatic experiences
- School Safety Perceptions
- Bullying Experiences
- Family Support
- Social Media Use
- Peer Relationships
Data Preprocessing
Exploratory Data Analysis
We will do it in class…
Creation of the Dataset
# This is an example of how to create a dataset for a model.
# You can use this as a template to create your own dataset.
<- clean_yrbs_2023 %>%
analysis_data select(
WeaponCarryingSchool, AttackedInNeighborhood, Bullying,
SexualAbuseByOlderPerson, ParentalPhysicalAbuse, ParentSubstanceUse,
ParentIncarceration, SchoolConnectedness, ParentalMonitoring,
UnfairDisciplineAtSchool, Homelessness|>
) filter(!is.na(WeaponCarryingSchool)) %>%
mutate(across(
c(
ParentSubstanceUse, ParentIncarceration, SchoolConnectedness,
ParentalMonitoring, UnfairDisciplineAtSchool
),~ as.numeric(.x) - 1
%>%
)) mutate(across(
c(
ParentSubstanceUse, ParentIncarceration, SchoolConnectedness,
ParentalMonitoring, UnfairDisciplineAtSchool
),~ factor(.x)
))
Splitting the Dataset
Cross-Validation
<- vfold_cv(analysis_train,
analysis_folds v = 5
) analysis_folds