# making theme for all plots so that everything has a consistent look
otter_theme <- function(
base_family = "Palatino",
base_size = 12,
...
) {
theme_minimal(
base_family = base_family,
base_size = base_size,
...
) +
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color = "#e0e0e0", linewidth = 0.4),
plot.title = element_text(
size = rel(1.4),
face = "bold",
color = "#1a1a1a",
margin = margin(b = 6)
),
plot.subtitle = element_text(
size = rel(1.0),
color = "grey40",
margin = margin(b = 10)
),
plot.caption = element_text(
size = rel(0.75),
color = "#999999",
hjust = 0,
margin = margin(t = 12)
),
axis.title = element_text(size = rel(0.9), color = "#666666"),
axis.text = element_text(size = rel(0.85), color = "#999999"),
legend.background = element_blank(),
legend.position = "bottom",
legend.key = element_blank(),
legend.title = element_text(face = "bold", size = rel(0.9)),
legend.text = element_text(size = rel(0.85)),
plot.margin = margin(t = 10, r = 10, b = 10, l = 10),
plot.title.position = "plot",
plot.caption.position = "plot",
panel.grid = element_blank()
)
}Measuring Global Tuberculosis Severity By Mortality Rates
Introduction
According to the World Health Organization (WHO), tuberculosis (TB) is the leading cause of death in the world from a single infectious source [1]. TB can remain asymptomatic for prolonged periods until reactivation, and requires a prolonged multi-drug treatment. This requires continuous compliance on the patient’s end, which can be prohibitive for low to middle-income households. Consequently, TB places a disproportionate burden on countries with limited healthcare infrastructure and greater proportion of the population living below the poverty threshold. The dataset we chose aggregates variables indicative of the burden TB imposes on the country from 2000 to 2023. The dataset was uploaded to Tidy Tuesday on November 11, 2025, with the title “World Health Organization Tuberculosis Burden Data.” The dataset comes from the getTBinR packages by Sam Abbott [2]. In the dataset, there are 5117 rows (yearly observations) and 18 columns. The dataset comprises 215 countries with entries spanning from 2000 to 2023. Of these, the following countries have incomplete temporal coverage: Curaçao, Montenegro, Serbia, Sint Maarten (Dutch part), South Sudan, Timor-Leste. South Sudan and Timor-Leste lack earlier data due to them gaining independence in 2011 and 2002, respectively. The countries are classified into the following WHO-designated regions: Eastern Mediterranean, Europe, Africa, Western Pacific, Americas, South-East Asia.
While effective treatments for tuberculosis have existed for several decades, TB persists as a public health threat globally. Contemporary TB is predominantly a socio-economic disease, disproportionately affecting populations with inadequate living or working conditions and limited access to regular medical screening [3]. We have chosen this dataset in order to explore which factors contribute to, is affected by, or indirectly indicates trends in contemporary TB. The dataset covers multiple countries around the globe, providing valuable information on TB trends across diverse climatic, welfare, healthcare accessibility, and socioeconomic contexts.
Tuberculosis Cases Resulting in Mortalities in High Impact Regions
Introduction
According to WHO, Africa, the Western Pacific, and Southeast Asia are the most effected regions by Tuberculosis [1]. While many developed nations saw a substantial decrease in TB incidence and mortality in recent decades, TB remains a public health crisis in these regions. Accordingly, our project focuses on these three regions with the additional objective of identifying clustering patterns in elevated TB mortality.
Given that we are specifically interested in contemporary TB, we will be condensing our analysis from 2003 to 2023. Among high burden regions such as Africa, Western Pacific, and Southeast Asia, which region saw the greatest change in case mortality per recorded TB cases? From those results, we will then take a more in depth look at the region with the highest case fatality rate (mortality per 100K population over recorded cases per 100K population) to examine any geograpical and temporal trends at the country level.
Approach
To answer this question, we will create a line graph and a choropleth. The line graph will be used to compare the case fatality rate (CFR) from 2003 to 2023 in the major affected WHO regions. Utilizing a line graph is essential for representing temporal trends. The connecting lines will help visualize the rate of change each year, where steeper slopes would indicate more rapid shifts. Further, we will differentiate the lines by color for the different regions. Thus, we can compare and contrast how the CFR changed over time across the different geographic regions.
The choropleth will then show the country level trends for the region with the highest CFR. In this case, Africa continuously displayed the highest mortality per confirmed cases. The choropleth is especially useful because it provides a direct comparison while also contextualizing the data for the reader by overlaying the statistics onto a map. Thus, we can also see if there are any spatial trends across the region in specific countries or if severity appears more sporadic.
Analysis
To begin our analysis, we will isolate the WHO dataset to contain only our regions of interest (Africa, South-East Asia, and the Western Pacific). From there, we will also narrow down our focus to be only from 2003-2023 for a decade-based approach. For our first plot, we are focused on a region-wide approach so will will group the data by region and year rather than individual countries. Thus, the count of mortality per 100k population is divided by the count of cases per 100k population, therefore calculating the proportion of mortality per case, by the entire region.
# condensing to interested regions since we don't need other countries, increases efficiency
countries <- c("Western Pacific", "South-East Asia", "Africa")
q1_df <- drop_na(tb_data)
q1_df <- q1_df |>
filter(g_whoregion %in% countries) |>
filter(!year %in% c(2000, 2001, 2002))
# calculating the proportion of TB cases resulting in mortalities so we have a variable of analysis in q1
grouped_df <- q1_df |>
group_by(g_whoregion, year) |>
mutate(
mort_sum = sum(e_mort_100k),
inc_sum = sum(e_inc_100k),
e_diff_100k = (mort_sum / inc_sum) * 100,
ind_diff = (e_mort_100k / e_inc_100k) * 100
) |>
ungroup()
# joining with coordinates for second plot so that we can map mortalities on geography
joined_q1 <- left_join(africa_map, q1_df, by = c("name_long" = "country"))
joined_q1 <- joined_q1 |>
mutate(
decade = (floor(year / 10) * 10)
) |>
group_by(decade, name_long, geom) |>
summarize(
decade_mort = mean(e_mort_100k),
decade_inc = mean(e_inc_100k)
) |>
mutate(
decade_diff = (decade_mort / decade_inc) * 100
) |>
filter(!is.na(decade_diff))`summarise()` has grouped output by 'decade', 'name_long'. You can override
using the `.groups` argument.
q1_line_plot <- ggplot(
grouped_df,
aes(x = year, y = e_diff_100k, color = g_whoregion)
) +
geom_point() +
geom_line() +
labs(
title = "Africa Sees the Greatest Proportion of Deaths From\n Confirmed TB Cases",
subtitle = "TB cases in Africa, Southeast Asia, and the Western Pacific resulting in death (2003-2023).",
color = "Region",
x = "Year",
y = "Proportion of Mortality (%)",
caption = "Figure 1"
) +
otter_theme() +
scale_color_manual(
values = c(
"Africa" = "#d7301f",
"South-East Asia" = "#2c7fb8",
"Western Pacific" = "#41ab5d"
)
) +
scale_y_continuous(labels = scales::label_percent(scale = 1))
q1_line_plot
For the second plot, we are using the spData library to gather geom objects for countries within Africa, which will be joined with our intial dataframe from Question 1 to allow for geographical analysis. In this case, the proportion of mortalities perk 100k cases will be calculating on the country level, though grouped by decade to include time analysis.
q1_map_plot <- ggplot(joined_q1) +
geom_sf(aes(fill = decade_diff), color = "white", size = 0.2) +
scale_fill_viridis_c(
option = "mako",
name = "TB Mortalities by Decade",
direction = -1,
labels = scales::label_percent(scale = 1)
) +
otter_theme() +
theme(
axis.text = element_blank(),
legend.key.width = unit(1, "cm")
) +
labs(
title = "Central Africa Has the Greatest Proportion of\nTB Mortalities From Confirmed Cases",
subtitle = "Proportion of mortalities from confirmed TB cases per 100k population in Africa \n (2000s-present).",
caption = "Figure 2"
) +
facet_wrap(~decade)
q1_map_plot
African countries missing from the map are classified by WHO as being within the Eastern Mediteranean region. These include Morroco, Libya, Sudan, Somalia, and Tunisia. All countries shown represent the “Africa” region as defined by WHO.
Discussion
Figure 1 represents the percent of mortality per confirmed cases of TB per 100k population, in the most affected regions by TB, Africa, South-East, and the Western Pacific. From the plot, there has been an overall decrease in the mortality rate for all three regions from 2003 to 2023. The decrease of mortality noted by this graph is in line with expansions of diagnostic capabilities and treatment that is noted by WHO. However, it should be noted that funding gaps remains a significant threat to equitable access to TB prevention and treatment.
Further, despite this decrease, Africa still displays the greatest mortality rate across the three regions, with the Western Pacific exhibiting the least region-wide mortality rate of the three. These declining trends, additionally, demonstrate a recovering of mortality spikes that occurred around 2020-2021 during the height of the COVID-19 pandemic.
Figure 2 analyzes Africa from a country-level perspective from 2000 to 2023, grouping by decade with the map farthest to the right demonstrating the current and ongoing decade. While there is a discrepancy in years for the 2020s compared to the other two decades, the plot is organized by proportion instead of raw numbers to normalize this. From Figure 2, we can see the decrease in severity of the mortality rate that was demonstrated in Figure 1. However, we can also see the country-specific changes. Specifically, the map shows that during the 2000s the Central African Republic was the African country with the highest mortality rate of 46.9%. Central Africa appears to be most affected by TB mortality within the region, despite decade improvements since the 2000s.
What Is Driving Africa’s TB Death Toll?
Introduction
From Question 1, we established that Africa consistently had the highest TB case fatality rate among the most affected regions. This raised the question, what is driving Africa’s disproportionately high TB death toll? HIV, a viral infection that suppresses the immune system, can form a deadly syndemic with TB. HIV-positive individuals not only have an elevated risk of TB infection, but also face higher chances of reactivation of latent TB and fatality. Co-treatment is further complicated by pharmacokinetic interactions between medication commonly used to treat each disease. The dataset separates TB mortality into HIV-attributable (e_mort_tbhiv_num, e_mort_tbhiv_100k) and total (e_mort_num, e_mort_100k) components, along with the WHO region classification (g_whoregion), allowing us to directly quantify the role of HIV co-infection.
Given the scale of both the HIV and TB epidemics in sub-Saharan Africa, we are interested in understanding how these two diseases seem to interact at the population level. How has HIV’s effect on TB mortality in Africa changed across the years, from 2000 to 2023? Does the HIV-TB burden remain uniquely concentrated in Africa today compared to other WHO regions?
Approach
To answer this question, we will create a stacked area chart and a stacked horizontal bar chart. The stacked area chart will display Africa’s total TB mortality from 2000 to 2023, decomposed into HIV-attributable and non-HIV components. A stacked area chart is ideal for this because it shows both the total magnitude and the relative composition over time, making it easy to see how the HIV-driven share has shifted. The use of color fill distinguishes the two mortality components.
The stacked horizontal bar chart will compare all six WHO regions by their TB mortality rate per 100,000 population in 2023, with each bar split into TB-only and HIV-attributable portions. A stacked bar chart is effective for cross-regional comparison because it allows the reader to simultaneously compare total mortality across regions while also seeing how much of each region’s burden is driven by HIV co-infection. This analysis will reveal whether the HIV-TB syndemic is uniquely concentrated in Africa or a global phenomenon.
Analysis
First, we will take the WHO dataset and filter out all regions except for Africa. Then, we calculate the estimated TB deaths from people with and without a HIV co-infection. This calculation will allow us to get more insights on the shared of HIV co-infection in relation to TB mortalities in Africa, which is plotted in an area chart.
# Prepare Q2 data: Africa TB mortality decomposed by HIV status over time
q2_africa <- tb_data %>%
filter(g_whoregion == "Africa") %>%
drop_na(e_mort_num, e_mort_tbhiv_num) %>%
group_by(year) %>%
summarize(
hiv_deaths = sum(e_mort_tbhiv_num),
nonhiv_deaths = sum(e_mort_num) - sum(e_mort_tbhiv_num)
) %>%
pivot_longer(
cols = c(hiv_deaths, nonhiv_deaths),
names_to = "cause",
values_to = "deaths"
) %>%
mutate(
cause = factor(
cause,
levels = c("nonhiv_deaths", "hiv_deaths"),
labels = c("TB Only", "TB-HIV Co-infection")
)
)
# Prepare Q2 data: HIV share of TB mortality by country and region (2023)
q2_hiv_share <- tb_data %>%
filter(year == 2023) %>%
drop_na(e_mort_100k, e_mort_tbhiv_100k) %>%
filter(e_mort_100k > 0) %>%
mutate(hiv_share = e_mort_tbhiv_100k / e_mort_100k)# stacked area chart of Africa TB mortality by HIV status to compae difference
q2_plot1 <- q2_africa %>%
ggplot(aes(x = year, y = deaths / 1000, fill = cause)) +
geom_area(alpha = 0.85) +
scale_fill_manual(
values = c("TB Only" = "#2c7fb8", "TB-HIV Co-infection" = "#d7301f"),
name = "Cause of Death"
) +
scale_x_continuous(breaks = seq(2000, 2023, by = 4)) +
scale_y_continuous(labels = scales::label_comma()) +
labs(
title = "HIV-Driven TB Deaths in Africa Have Fallen Dramatically",
subtitle = "Total TB mortality in Africa (thousands), decomposed by HIV co-infection status\n (2000-2023).",
x = "Year",
y = "TB Deaths (thousands)",
caption = "Figure 3"
) +
otter_theme()
q2_plot1
For the second plot, we are comparing TB-HIV co-infection mortalities in Africa compared to the other WHO designated regions. This plot will allow us to get a better sense of how the impact of TB-HIV co-infection affects Africa at a greater rate than other regions.
# overlaid bar chart — total TB mortality per 100K (dimmed) with HIV-TB overlay
q2_region_rates <- tb_data %>%
filter(year == 2023) %>%
drop_na(e_mort_100k, e_mort_tbhiv_100k, e_pop_num) %>%
group_by(g_whoregion) %>%
summarize(
total_mort_rate = sum(e_mort_num) / sum(e_pop_num) * 100000,
hiv_mort_rate = sum(e_mort_tbhiv_num) / sum(e_pop_num) * 100000
) %>%
mutate(g_whoregion = fct_reorder(g_whoregion, total_mort_rate))
q2_plot2 <- q2_region_rates %>%
mutate(nonhiv_rate = total_mort_rate - hiv_mort_rate) %>%
pivot_longer(
cols = c(hiv_mort_rate, nonhiv_rate),
names_to = "cause",
values_to = "rate"
) %>%
mutate(
cause = factor(
cause,
levels = c("nonhiv_rate", "hiv_mort_rate"),
labels = c("TB Only", "TB-HIV Co-infection")
)
) %>%
ggplot(aes(x = rate, y = g_whoregion, fill = cause)) +
geom_col(width = 0.6, alpha = 0.9) +
scale_fill_manual(
values = c("TB Only" = "#2c7fb8", "TB-HIV Co-infection" = "#d7301f"),
name = "Cause of Death"
) +
scale_x_continuous(
expand = expansion(mult = c(0, 0.15))
) +
labs(
title = "HIV-TB Co-Mortality Remains an Outsized African Crisis",
subtitle = "TB mortality per 100K by region, decomposed by HIV co-infection status (2023)",
x = "TB Mortality per 100K Population",
y = NULL,
caption = "Figure 4"
) +
otter_theme() +
theme(
panel.grid.major.y = element_blank(),
panel.grid.major.x = element_line(color = "#e0e0e0", linewidth = 0.4)
)
q2_plot2
Discussion
Figure 3 zooms into Africa over time and shows a more hopeful story. Total TB deaths peaked around 2004-2006 (~900K) and have been declining steadily since. The key insight we gained is what drove this decline. The plot shows that the proportion of TB mortality that was driven by HIV co-infection (red) decreased significantly, from taking up about half the total deaths in 2000 to roughly one fourth the total deaths in 2023. This likely reflects the massive rollout of antiretroviral therapy (ART) across sub-Saharan Africa starting in the mid-2000s through programs like PEPFAR and the Global Fund [4]. ART, a treatment for HIV taken daily, started going through large scale production in 2003 and increased in low-incom household accessibility tenfold by 2008 [5]. This is reflected on the plot, as HIV co-infeciton driven TB deaths show a greater rate of decline around 2008. Meanwhile, the TB-only deaths (blue) have declined more modestly, suggesting that improvements in TB-specific treatment and detection have been slower.
Figure 4 shows that Africa’s CFR is dramatically higher than every other region, and a huge chunk of it (the red portion, roughly a third) is driven by HIV co-infection. South-East Asia has a high overall TB burden too, but almost all of it is TB-only — the HIV component is tiny. This tells you the TB crisis looks fundamentally different depending on the region: in Africa it’s a dual epidemic (TB + HIV), everywhere else it’s primarily a TB problem on its own.
Presentation
Our presentation can be found here.
Data
Abbott, Sam. “Access and Summarise World Health Organization Tuberculosis Data.” Samabbott.co.uk, 2018, samabbott.co.uk/getTBinR/. Accessed 10 Feb. 2026.
References
[1] “1.1 TB Incidence.” Who.int, 2024, www.who.int/teams/global-programme-on-tuberculosis-and-lung-health/tb-reports/global-tuberculosis-report-2025/tb-disease-burden/1-1-tb-incidence.
[2] Abbott, Sam. “Access and Summarise World Health Organization Tuberculosis Data.” Samabbott.co.uk, 2018, samabbott.co.uk/getTBinR/. Accessed 4 Mar. 2026.
[3] Lönnroth, Knut, et al. “Drivers of Tuberculosis Epidemics: The Role of Risk Factors and Social Determinants.” Social Science & Medicine, vol. 68, no. 12, June 2009, pp. 2240–2246, https://doi.org/10.1016/j.socscimed.2009.03.041.
[4] El-Sadr, Wafaa M et al. “Scale-up of HIV treatment through PEPFAR: a historic public health achievement.” Journal of acquired immune deficiency syndromes (1999) vol. 60 Suppl 3,Suppl 3 (2012): S96-104. doi:10.1097/QAI.0b013e31825eb27b