The Genomics of Cuisine

Mapping Flavor Compound Relationships Across 20 World Cuisines

Cindy, Reinesse, Hannah

2026-05-13

Question, Data & Dashboard

How do ingredient patterns reveal the relationships, uniqueness, and hidden connections between world cuisines??

39,774 recipes × 20 cuisines × 6,703 ingredients.

Six interactive tabs

Tab What it answers
Cuisine Overview Dataset summary, TF-IDF signature grid, interactive heatmap
Cuisine Family Tree UPGMA dendrogram, surprise scores, cross-tab navigation
Cuisine Explorer Profile (radar, donut) + comparison (TF-IDF, shared ingredients)
How Ingredients Connect Flavor compound network (visNetwork)
Ingredient Explorer Single-ingredient search with auto-fill chips
Global Cuisine Map Clickable choropleth (leaflet + rnaturalearth)

Methods

TF-IDF:

Finds ingredients that are frequent in a cuisine but rare elsewhere:

\[\text{TF-IDF}(i,c) = \underbrace{\frac{\text{count}(i,c)}{\text{total}(c)}}_{\text{TF}} \times \underbrace{\log\!\left(\frac{20}{\text{cuisines using } i}\right)}_{\text{IDF}}\]

  • Salt → IDF ≈ 0 (all 20 cuisines use it)
  • Mirin → high IDF (only Japanese)

Jaccard similarity + UPGMA:

Each cuisine = a binary set of ingredients (present/absent):

\[J(A,B) = \frac{|A \cap B|}{|A \cup B|}\]

  • Surprise score = \(J(A,B) \times \frac{\text{geo\_dist}}{\max(\text{geo\_dist})}\) — upweights pairs that are culinarily close despite being far apart

Design Choices & Limitations

Design choices

  • Green primary palette — high contrast on white, accessible for color vision deficiency, thematically tied to food/freshness. Region colors reserved for dendrogram and map only
  • Single accent color per chart — early continent color-coding with 20 colors was overwhelming.
  • Chart type matched to data structure — radar for multi-axis profiles, donut for part-to-whole, heatmap for pairwise matrix, network for graph data, choropleth for geography
  • Tab order follows a natural inquiry arc — dataset overview → cuisine relationships → explore one → zoom into ingredients → map.

Limitations

  • Representational bias — Yummly is U.S.-based, so recipes skew toward Western home cooks. “Cajun/Creole” is treated as coordinate with “Chinese” despite representing a regional subculture
  • Ingredient normalization — raw strings like “chicken breast” and “boneless chicken” are treated as separate ingredients. FlavorDB partially addresses this, but many strings fall to “Other”
  • Cuisine-level aggregation — Indian cuisine encompasses hundreds of regional traditions; a single TF-IDF vector flattens that diversity
  • Static flavor backbone — Ahn et al. (2011) data is 15 years old with likely sparser coverage of non-Western ingredients