Kashamala Arif Kelly Zhang Stephen Syl-Akinwale Mith Patel
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ readr 2.1.5
✔ ggplot2 3.5.1 ✔ stringr 1.5.1
✔ lubridate 1.9.4 ✔ tibble 3.2.1
✔ purrr 1.0.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
Attaching package: 'rnaturalearthdata'
The following object is masked from 'package:rnaturalearth':
countries110
Rows: 68815 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): month_grouping, month_abbv, component, land_border_region, area_of...
dbl (2): fiscal_year, encounter_count
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 54939 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): month_grouping, month_abbv, land_border_region, state, demographic,...
dbl (2): fiscal_year, encounter_count
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Introduction
Understanding border enforcement trends is essential for shaping immigration policies and addressing regional disparities in border encounters. This report analyzes U.S. Customs and Border Protection (CBP) encounter data, which includes deportations, asylum requests, penalties for unauthorized crossings, and other enforcement actions. By examining encounters across geographic regions and demographic groups, we aim to identify which states experience the highest encounters and how these encounters vary by type.
The dataset used in this analysis comes from CBP’s live systems and covers interactions at the Northern Land Border, Southwest Land Border, and Nationwide (air, land, and sea) entry points. The data is divided into two CSV files: cbp_resp.csv, which contains regional encounter data with 12 variables, and cbp_state.csv, which details encounters by U.S. state with 9 variables. Key variables include fiscal year, border region, state, demographic group, citizenship, and type of enforcement authority (Title 8 or Title 42) . Numeric variables, such as encounter counts, allow for statistical modeling and visualization. The dataset’s live nature ensures real-time insights, making it a valuable resource for tracking shifts in border activity over time.
How does citizenship relate to the type of encounter across different border regions?
Introduction
The question at hand seeks to explore how citizenship influences the type of encounter (Title 8 vs. Title 42) across different land border regions. This is a crucial question because it can help identify if certain nationalities face different types of border enforcement actions depending on where they cross, potentially revealing patterns. Understanding these relationships is important for analyzing whether specific groups of individuals are more likely to be processed under public health measures (Title 42) versus standard immigration law (Title 8) and how this varies regionally. Looking at the encounter counts we can also see what region to focus on based on the most frequent states for this to happen.
Title 8 inadmissible are people who are denied entry to the United States because they have violated immigration laws. This can include people who have misrepresented their identity or who have a criminal history.
A Title 8 apprehension is when the U.S. Border Patrol (USBP) detains someone who is not legally in the United States
Title 42 expulsions are the removal of people from the United States by the government based on a public health threat. The policy was used to expel migrants at the U.S.-Mexico border during the COVID-19 pandemic
Approach
Funneling down from the state counts we are looking at we used the map to see the most dense place in terms of encounter types. This allows us to see what exactly our area of focus would be. The Mercator map visualizes what states are regarded as border and the jitter plot is added as a composite to visualize how many cases we can see per state in the region. Title of authority (title 8) to the citizenship
For our approach we wanted to use two plots to help tell the story:
Mercator map of the North border and Southern showing the title 8 VS title 42 cases
Bar chart to show the detailed reports on whats happening in those places.
# Map calibrationus_states <-ne_states(country ="United States of America", returnclass ="sf")north_border_states <-c("Washington","Idaho","Montana","North Dakota","Minnesota","Michigan","New York","Vermont","New Hampshire","Maine")south_border_states <-c("California","Arizona","New Mexico","Texas")us_states <- us_states|>mutate(region =case_when( name %in% north_border_states ~"Northern Land Border", name %in% south_border_states ~"Southwest Land Border"))|>filter(!is.na(region))
Analysis
# Data cleaningborder_summary <- regional_reports |>filter(land_border_region %in%c("Northern Land Border", "Southwest Land Border")) |>group_by(land_border_region, title_of_authority, demographic, encounter_type) |>summarize(encounter_count =sum(encounter_count, na.rm =TRUE)) |>ungroup()
`summarise()` has grouped output by 'land_border_region', 'title_of_authority',
'demographic'. You can override using the `.groups` argument.
# matching data to respective regionsus_map_data <- us_states |>left_join(border_summary, by =c("region"="land_border_region"))|>mutate(encounter_count_scaled =log(encounter_count +1))
Warning in sf_column %in% names(g): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.
# North and Southern Border Map# Separate laterggplot(data = us_map_data) +geom_sf(aes(fill = encounter_count_scaled),color ="white", size =0.4) +geom_jitter(aes(x =jitter(longitude, amount =1.5),y =jitter(latitude, amount =1.5),color = title_of_authority),width =0.5, height =0.5,alpha =1, size =1 ) +scale_fill_continuous(trans ='reverse')+# scale_fill_viridis_c(option="cividis",end = 1, begin = 0) +scale_color_viridis_d(option ="turbo", name ="Title of Authority", end = .7, begin = .5) +theme_void() +labs(title ="Border Encounters by Title of Authority",subtitle ="Comparison of Title 8 vs. Title 42 across U.S. Borders",fill="Encounters",caption ="Source: CBP Data\nEncounter count has been scaled to enhance visibility of differences in counts can be noticable",shape="Title of Authority" )+theme(axis.title =element_blank(),axis.text =element_blank(),axis.ticks =element_blank(),axis.line.x =element_blank(),plot.caption =element_text(colour ="grey50", size =7) )
# Cleaning for countries of origin# countries<-state_reports|># distinct(citizenship)get_continent <-function(country) {# Dataset appears to blur the line on what 'other' might entail for these casesnorth_americans <-c("CANADA", "MEXICO") south_americans <-c("BRAZIL", "COLOMBIA", "ECUADOR", "PERU", "VENEZUELA") central_americans <-c("GUATEMALA", "HONDURAS", "EL SALVADOR", "NICARAGUA") caribbeans <-c("HAITI", "CUBA") europeans <-c("RUSSIA", "ROMANIA", "UKRAINE") asians <-c("CHINA, PEOPLES REPUBLIC OF", "INDIA", "PHILIPPINES", "MYANMAR (BURMA)", "TURKEY") others <-c("OTHER") if (country %in% asians) {return("Asia") } elseif (country %in% north_americans) {return("North America") } elseif (country %in% south_americans) {return("South America") } elseif (country %in% europeans) {return("Europe") } else {return("Other") }}df_continents <- regional_reports %>%mutate(continent =sapply(citizenship, get_continent))
#Bar chart showing detailed relationshipsdf_filtered <- df_continents %>%filter(title_of_authority =="Title 8")df_bar_chart <- df_filtered %>%group_by(continent, encounter_type) %>%summarise(total_encounters =sum(encounter_count), .groups ="drop")# Create the bar chartggplot(df_bar_chart, aes(x = continent, y = total_encounters, fill = encounter_type)) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Total Encounters by Continent and Encounter Type under Title 8",x ="Places of Citizenships People Hold",y ="Total Encounters",fill ="Encounter Type") +scale_fill_manual(values =c("Inadmissibles"="darkblue", # Red-Orange"Expulsions"="cyan", # Blue"Apprehensions"="lightblue")) +scale_y_continuous(labels = scales::label_number(scale =1e-6, suffix ="M"),breaks =seq(0, 3.5e6, by =5e5) ) +theme_minimal()
Discussion
During this analysis we aimed to ask questions that provoked more thought to fix common assumptions about the border. Our findings from the map allowed us to conclude that the Southern border has the strongest concentration of encounters among the two border regions (North and South). Map visualization also made it apparent that Title 8 cases are the more popular case in the Southern border. To explore this further, we created a bar graph focusing solely on Title 8 cases. The bar chart showed that “Apprehensions”, or instances of detention, were the most common outcome at these borders, with a significant proportion involving individuals from South America. However, we also observed that fewer South Americans were classified as “Inadmissible.” This applies to individuals who are denied entry into the U.S. at a port of entry because they do not meet the legal requirements for admission. From this analysis, we can conclude that enforcement actions under Title 8 at the Southern border primarily result in apprehensions, particularly for individuals from South America, rather than inadmissibility determinations. This suggests that many South Americans attempting to cross the border do so without proper authorization, leading to detention rather than being denied entry at a port of entry. The lower number of South Americans classified as inadmissible may indicate that fewer individuals from this region are attempting to enter the U.S. through official ports of entry or that those who do may have stronger legal grounds for admission.
Do certain states have more encounters with a specific demographic, and how do these encounter types differ?
Introduction
(1-2 paragraphs) Introduction to the question and what parts of the dataset are necessary to answer the question. Also discuss why you’re interested in this question.
Understanding which states experience the highest numbers of border encounters and how these encounters differ by type provides important information for immigration policy development and resource allocation. Our analysis focuses on identifying patterns in encounter types across the most active border states to reveal how enforcement actions vary geographically - this is very important because different regions may employ varying approaches to migration management based on local conditions, political factors, and available resources, etc. So by examining these state-level differences, we can better understand the approaches to border enforcement across the United States and identify potential disparities in how encounters are processed.
Approach
(1-2 paragraphs) Describe what types of plots you are going to make to address your question. For each plot, provide a clear explanation as to why this plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. The two plots should be of different types, and at least one of the two plots needs to use either color mapping or facets.
To analyze if certain states have more encounters with a specific demographic, we will use: 1. A bar chart showing the top 4 states by total encounter count. There will be 3 bars for each state representative of the top 3 most common demographic groups crossing borders in each state. We will use a bar chart of the same format to analyze encounter types across the same 4 states to make comparison easier for viewers.
To analyze how encounter types differ across states, we will use:
A bar chart showing the top 4 states by total encounter count, with bars segmented by encounter type. This visualization will clearly display which states have the highest volume of encounters and how the distribution of encounter types varies across these states. A bar chart is ideal for this comparison as it effectively shows both the total encounters per state and the proportion of each encounter type within those totals.
A heatmap displaying the relationship between states (top 4) and encounter types, with color intensity representing the number of encounters. This visualization provides a different perspective on the same data, making it easier to identify patterns and outliers in how states handle different types of encounters. The heatmap uses color mapping to represent encounter frequencies, allowing for quick visual identification of high-activity combinations.
Analysis
# identify top 4 states by total encounters firststate_totals <- state_reports %>%group_by(state) %>%summarize(total_encounters =sum(encounter_count, na.rm =TRUE)) %>%arrange(desc(total_encounters)) %>%head(4)top_states <- state_totals$state# then filter data for top 4 statestop_states_data <- state_reports %>%filter(state %in% top_states)# create a more detailed dataset that separates apprehensions and inadmissibles - before it was mergingencounter_by_state <- top_states_data %>%group_by(state, title_of_authority, demographic) %>%summarize(encounters =sum(encounter_count, na.rm =TRUE)) %>%ungroup()
`summarise()` has grouped output by 'state', 'title_of_authority'. You can
override using the `.groups` argument.
# create more specific encounter type categories based off dataencounter_by_state <- encounter_by_state %>%mutate(encounter_type =case_when( title_of_authority =="Title 42"~"Expulsions", title_of_authority =="Title 8"& demographic %in%c("FMUA", "Single Adults", "UC") ~"Apprehensions", title_of_authority =="Title 8"~"Inadmissibles",TRUE~"Other" ))# preparing for visualizationstate_encounter_viz <- encounter_by_state %>%group_by(state, encounter_type) %>%summarize(total =sum(encounters, na.rm =TRUE)) %>%ungroup()
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
# the bar chart of encounter types for top 4 states mentioned in point 1 of approachggplot(state_encounter_viz, aes(x = state, y = total, fill = encounter_type)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("Apprehensions"="#1F77B4", "Inadmissibles"="#2CA02C","Expulsions"="#FF7F0E" )) +# currently formatting y-axis to show numbers in millions in a more readable wayscale_y_continuous(labels = scales::label_number(scale =1e-6, suffix ="M"),breaks =seq(0, 3.5e6, by =5e5) ) +labs(title ="Encounter Types Across Top 4 Border States",x ="State",y ="Number of Encounters (Millions)",fill ="Encounter Type",caption ="Encounter type is the category of encounter based on Title of Authority and component \n (Title 8 for USBP = Apprehensions; Title 8 for OFO = Inadmissibles; Title 42 = Expulsions" ) +theme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =0.5, size =12, face ="bold"),axis.text.y =element_text(size =10),axis.title =element_text(size =12),legend.title =element_text(size =12),legend.text =element_text(size =10),plot.title =element_text(size =16, face ="bold") )
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
print(top_demographics)
# A tibble: 12 × 3
# Groups: state [4]
state demographic total_encounters
<chr> <chr> <int>
1 AZ Single Adults 1287304
2 AZ FMUA 701962
3 AZ UC / Single Minors 116265
4 CA Single Adults 1279066
5 CA FMUA 435503
6 CA UC / Single Minors 39798
7 FL Single Adults 420753
8 FL FMUA 169736
9 FL Accompanied Minors 4664
10 TX Single Adults 3185466
11 TX FMUA 1601885
12 TX UC / Single Minors 400698
# the bar chart of encounter types for top 4 states mentioned in point 1 of approachggplot(top_demographics, aes(x = state, y = total_encounters, fill = demographic)) +geom_bar(stat ="identity", position ="dodge") +scale_fill_manual(values =c("Single Adults"="#1F77B4", "FMUA"="#2CA02C","UC / Single Minors"="#FF7F0E","Accompanied Minors"="#FF0000" )) +# currently formatting y-axis to show numbers in millions in a more readable waylabs(title ="Encounters Across Top 4 Border States \n By Demographic",x ="State",y ="Number of Encounters (Millions)",fill ="Demographic",caption ="FMUA = Individuals in a Family United, UC = Unaccompanied Children" ) +theme_minimal() +theme(axis.text.x =element_text(angle =0, hjust =0.5, size =12, face ="bold"),axis.text.y =element_text(size =10),axis.title =element_text(size =12),plot.title.position ="plot",legend.title =element_text(size =12),legend.text =element_text(size =10),plot.title =element_text(size =16, face ="bold", hjust =0.5) )
# create a heatmap for the top 4 states and encounter types# use the state_encounter_viz data prepared earlierheatmap_data <- state_encounter_viz %>%# Log transform to better visualize differencesmutate(log_count =log10(total +1))# heatmap timeggplot(heatmap_data, aes(x = encounter_type, y = state, fill = log_count)) +geom_tile(color ="white", linewidth =0.5) +# Add borders between tilesscale_fill_viridis_c(option ="inferno", name ="Encounter Count",limits =c(0, 7),breaks =seq(0, 6, by =2) ) +geom_text(aes(label = scales::comma(total)), color ="white", size =3.5,fontface ="bold" ) +labs(title ="Heatmap of Border Encounter Types by State",subtitle ="Showing counts for top 4 states with most encounters",x ="Encounter Type",y ="State" ) +theme_minimal() +theme(axis.text =element_text(size =11, face ="bold"),plot.title =element_text(size =14, face ="bold"),plot.subtitle =element_text(size =12),panel.grid =element_blank(), # Remove grid linesaxis.text.x =element_text(hjust =0.5) )
Discussion
(1-3 paragraphs) In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.
Do certain states have more encounters with a specific demographic?
A clear outlier in demographics is Florida which is the only state with accompanied minors as one of their top 3 encounter demographics instead of unaccompanied minors which is the third most common demographic among the other three states. From our research, unaccompanied minors usually cross borders to flee poverty, violence, or exploitation. This makes sense that Texas, Arizona, and California which all border Mexico have higher numbers of unaccompanied minors than Florida does. Single adults are overwhelmingly the most common demographic group to cross borders in all four states.
How do encounter types differ across states?
The two visualizations reveal clear differences in encounter types across border states. Texas leads all categories with the highest numbers of apprehensions (3.3M), expulsions (1.5M), and inadmissibles (394K). While apprehensions dominate across all states, Florida shows a unique pattern with virtually no expulsions (only 5) despite significant apprehensions (590K). Arizona and California maintain similar apprehension-to-expulsion ratios (about 3:1), while Texas shows roughly 2:1.
The heatmap highlights that inadmissibles constitute the smallest category everywhere, with Texas and Arizona processing the most. These variations reflect different regional enforcement approaches based on geography, infrastructure, and local migration patterns.