Chapter 18 Sample Allocation Model (SAM)

18.1 What is the Sample Allocation Model (SAM)?

The Sample Allocation Model (SAM) identifies an optimal surveillance strategy for areas with no cases of CWD, given the natural spread of the disease and the costs associated with surveillance. The SAM framework provides three model settings that allow users to flexibly integrate the probability of disease spread with any historical sampling data and/or expense data to understand (1) the probability that any given area is disease-free at present, and (2) how to best allocate a surveillance budget to be able to identify the introduction of CWD as early as possible.

18.2 What Questions Does it Answer?

Question 1. What is the unseen infection state given up to 3 years of historical sampling without finding a positive case? Model Mode 1 of SAM considers prior years of sampling along with the unobserved spread of CWD to determine the probability that any given sub-administration area is presently disease-free. Results reveal sub-administrative areas that may be “blind spots” in disease surveillance, where insufficient testing has occurred while CWD could be present and spreading silently. Alternatively, results can pinpoint sub-administrative areas where sampling has been sufficient through time such that CWD should have been detected by now (if it were present). This model provides the probability that a sub-administrative unit is presently disease-free, and the probabilities of 0.5% and 1.0% disease prevalence.

Note: To compute the probability of the freedom of disease after sampling has occurred, and given the clustering tendencies of hosts, use the Probability of Disease Freedom Using Clustering Model.

Question 2. Given the answer to Question 1, where should we look for CWD this coming year, and how much effort should we use given my agency’s CWD surveillance budget? Model Mode 2 of SAM builds on Model 1, utilizing the estimated probability of disease freedom and user-provided budget for disease surveillance to strategically allocate a sampling plan across the entire jurisdiction. The resulting allocation constitutes the “optimal control” given a fixed surveillance budget. Optimal control means that the silent spread of CWD across the jurisdiction is minimized at first detection given the capped amount of sampling dollars available to make that first detection. In other words, optimal control is the best estimated balance of surveillance sampling under a budget to ensure that CWD is detected as early as possible. Model Mode 2 provides the same output as Model Mode 1, plus a state-level estimate of the expected delay (in years) of first detection of CWD, and recommended sampling and corresponding probabilities of disease prevalence for multiple years. Sampling prescriptions are valid for future years provided CWD remains undetected.

Question 3. Given the answers to Questions 1-2, what budget can improve this year’s surveillance program (i.e., reduce silent spread up to the moment of first detection)? Model Mode 3 builds on Model Modes 1 and 2 to further consider how optimal control may change given a smaller or larger annual budget. This model optimizes sampling strategies over various budgets to identify the best way to achieve earlier detection and more strategic allocation relative to current practices, minimizing cost while maintaining sufficient sampling. This model provides the same output as Model Modes 1 and 2, plus a cost analysis that is helpful for planning the annual budget necessary to bolster an agency’s surveillance program.

Note: SAM produces surveillance targets based on sampling optimization over a budget. It does not, however, produce standard errors or statistical confidence. To determine sample sizes necessary to reach statistical assurance that CWD is absent in an area, refer to the Statistical Sample Size Quotas Using Clustering Model or the Efficient Sample Size Calculator.

18.3 Abbreviated Tutorial

  1. Part of the input required for SAM includes output from a previously run Risk-weight Surveillance Quotas model. Before running SAM, ensure there has been a “Risk” model executed for use in SAM for the target season-year.
  2. Run the Sample Allocation Model (SAM) from the CWD Data Warehouse.
  3. Select model Mode:
    • Explore probabilities of disease status (Mode 1).
    • Explore probabilities of disease status and obtain surveillance sampling targets (Mode 2).
    • Explore probabilities of disease status, obtain surveillance sampling targets, and run the cost analysis (Mode 3).
  4. Provide other necessary inputs, specify Risk model to be used in SAM model.
  5. Explore the model logs, input file, and output files from the model execution.
  6. Explore the visualizations from the model execution.
  7. If the model did not run, check the model logs to understand required data that was missing.

18.4 Parameters Needed to Execute the Model

  • Model type: Select ‘Sample Allocation Model (SAM)’ from the drop-down list.

  • Reference name: Label the model execution.

  • Applicable season-year (Optional): Label the season-year of the run to assist in documentation.

  • Notes (Optional): Enter any remarks about the model exexcution.

  • Mode:

    Select Mode 1 to: Determine the posterior infection state only (i.e., to answer Question 1). [Shortest runtime]

    Select Mode 2 to: Determine the posterior infection state then allocate samples (i.e., to answer Question 2). [Moderate runtime]

    Select Mode 3 to: Determine the posterior infection state, allocate samples, then conduct a cost analysis to further improve optimal control (i.e., to answer question 3). [Longest runtime]

  • Season-year: The target season-year for which to plan for, this should be the upcoming (or future) season-year.

  • Total budget: The total annual budget that can be spent on CWD surveillance across the whole agency in the upcoming season-year. Not required for Model 1.

  • Look-back period: SAM can be used to compute probability of disease freedom using 0-3 prior years of historical sampling data.

    Look-back period of 0 years: Proceeds by assuming all sub-administrative areas are disease free.

    Look-back period of 1 to 3 years: Computes the probability of disease-freedom based on 1-3 prior years of sampling data. This option should especially be used if introduction risk changed in any sub-administration area since the look-back period.

    Note: Look-back period is limited to 3 historical years because SAM assumes that introduction risk is static. Changes in introduction risk include new CWD outbreaks in neighboring areas, additional avenues of anthropogenic prion introduction, etc.

  • Annual growth rate: The rate that governs the annual increase in CWD prevalence (once established) in a region. The default value is 0.2, which reflects the rise in prevalence from 0.5% to 1% over approximately five to seven years.

  • Risk model: A previously executed Risk-weight Surveillance Quotas for the agency (applicable for the target season-year).

  • For each sub-administrative unit:

    • Consider: A check box to indicate which sub-administrative units should be included in the model, i.e. the sub-administrative units where sampling will take place. Sub-administrative units where CWD has already been detected should be excluded.

    • Expense: The per-sample cost of surveillance in that sub-administrative unit (e.g. a single, sample-level cost that incorporates all costs of testing, such as equipment, personnel time, procurement, etc.). An expense is required for each included sub-administrative unit.

      Note, this will later be incorporated with the Per-Sample Cost Analysis model where per-sample costs per sub-administrative unit can be calculated from provided expenses. Users will then be able to import results from such model into SAM as the expense estimates. This is not currently available yet.

    • Maximum harvest capacity: The harvest capacities specific to each included sub-administrative area. The model will use a default threshold of 1,000 for fields left unspecified.

18.5 Output Details

  • Model Model 1 Output:
    • Results:
      • posterior_results.csv : For each sub-administrative unit, the posterior probability of disease prevalence at present is given for 3 “states”. State 0 corresponds to being disease-free, state 1 corresponds to a 0.5% prevalence, and state 2 corresponds to a 1.0% prevalence.

        For example, for Unit A, if State 0 has a value of 0.99, State 1 is 0.001, and State 2 is 0.0001 this means that given the model specifications, Unit A has an estimated 99% probability of being free from CWD; a 0.1% estimated probability of having a 0.05% prevalence; and a 0.01% estimated probability of having 1.0% prevalence.

    • Visualization:
      • For the posterior results, users would choose a “state” to visualize and results can be shown on a map of the agency where the sub-administrative units are shaded based on the probabilities of the chosen state (i.e. choropleth).
  • Model Model 2 Output:
    • Results:
      • posterior_results.csv : As described above.
      • objective.csv : A single value representing the average expected delay for the first detection of CWD for the agency.
      • probability_prevalence_0_5.csv : For each sub-administrative unit, the current probability of CWD having 0.5% prevalence and the probability at each year for 30 years, assuming no CWD detection.
      • probability_prevalence_1_0.csv : For each sub-administrative unit, the current probability of CWD having 1.0% prevalence and the probability at each year for 30 years, assuming no CWD detection.
      • probability_disease_free.csv : For each sub-administrative unit, the current probability of having no CWD and the probability at each year for 30 years, assuming no CWD detection.
      • sample_efforts.csv: For each sub-administrative unit, the recommended sampling effort for 30 years to maintain optimal control, assuming no CWD detection. The sample efforts are the ratio of total available samples (under budget constraints) that are used for testing CWD.
      • sample_size.csv: For each sub-administrative unit, the recommended sample sizes for 30 years to maintain optimal control, assuming no CWD detection.
    • Visualization:
      • For the posterior results, users would have the same visualization options described above.
      • For the probability and sampling results over 30 years, users will be able to choose which result to visualize and the results will be shown on a time series figure, depicting the change of probabilities and sampling over time.
  • Model Model 3 Output:
    • Results:
      • posterior_results.csv : As described above.
      • objective.csv : As described above.
      • probability_prevalence_1_0.csv : As described above.
      • probability_disease_free.csv : As described above.
      • sample_efforts.csv: As described above.
      • sample_size.csv: As described above.
      • cost_analysis.csv : The expected delay for the first detection of CWD for the agency at various budgets, in comparison to the user-input budget.
    • Visualization:
      • For the posterior, probability, and sampling results users would have the same visualization options described above.
      • The costs analysis will display with a line graph how the expected delay for the first detection of CWD changes over different budgets, compared to the budget provided.

18.6 Details on the Theory

Wang J, Hanley B, Thompson N, Gong Y, Walsh D, Huang Y, Gonzalez-Crespo C, Booth J, Caudell J, Miller L, Schuler K. Strategic Planning of Prevention and Surveillance for Emerging Diseases and Invasive Species. In press.

18.7 Code

The code is publicly available at https://github.com/Cornell-Wildlife-Health-Lab/sample-allocation-model.