Chapter 20 Statistical Sample Quotas Model using Clustering

20.1 What is the Statistical Sample Quotas Model using Clustering?

The Statistical Sample Quotas Model using Clustering leverages naturally-occurring biological groupings and correlation structure within those groupings to plan a statistically valid simple random sampling scheme for disease surveillance.

20.2 What Question Does it Answer?

Question 1. How many individuals should I test without finding a positive case to have high probability that disease prevalence in the population is at or below a predetermined percentage?

20.3 Output Details

  • Sample size: A map and table containing the number of samples that must test negative for disease from each sub administrative unit to have high probability that the underlying disease prevalence is at or below the desired level.

20.4 Abbreviated Tutorial

  1. Choose your model parameters (see below).
  2. Run the model with those desired population parameters.
  3. Look at the map to see the number of animals that need to be tested without finding a positive case to ensure there is a high probability that disease prevalence in the population is at or below your desired level.

20.5 Parameters Needed to Execute the Model

  • Model type: Select ‘Statistical Sample Quotas Model using Clustering’ from the drop-down list.

  • Reference name: Label the run.

  • (Optional) Applicable season year: Label the season-year. This label is not used in the model execution and is intended to assist the provider in documenting the model execution.

  • (Optional) Notes: Enter any additional remarks about the run.

  • Season-year: Select one season-year for which to determine sample quotas.

  • Host density: The number of hosts that reside in one square kilometer of land area. OR

  • Population Size: The number of hosts that reside in the sub administrative unit.

  • Average cluster size: Average cluster size of hosts in the population in each sub administrative unit. An integer value between 1 (1 host per cluster in the population) and the total population size (1 cluster in the entire population). Note: The software will automatically ensure that your cluster size does not exceed the population size.

  • Correlation in disease status: Correlation in disease status between hosts sharing a cluster. A decimal between 0 (disease status among hosts in a cluster is independent) and 0.995 (disease status is nearly perfectly correlated among hosts sharing a cluster).

  • Sensitivity of the diagnostic test: The performance of the test in declaring a true positive. A decimal between 0 (not sensitive, will not appropriately declare a true positive) and 0.999 (nearly perfect sensitivity, high performance in declaring a true positive).

20.6 Details on the Theory

Booth JG, Hanley BJ, Hodel FH, Jennelle CS, Guinness J, Them CE, Mitchell CI, Ahmed MS, Schuler KL. 2024. Sample Size for Estimating Disease Prevalence in Free-Ranging Wildlife Populations: A Bayesian Modeling Approach. Journal of Agricultural, Biological, and Environmental Sciences, 29, 438–454. https://doi.org/10.1007/s13253-023-00578-7.

Booth JG, Hanley BJ, Thompson NE, Gonzalez-Crespo C, Christensen SA, Jennelle CS, Caudell JN, Delisle Z, Guinness J, Hollingshead NA, Them CT, Schuler KL. Management Agencies Can Leverage Animal Social Structure for Wildlife Disease Surveillance. Journal of Wildlife Diseases. Journal of Wildlife Diseases. https://doi.org/10.7589/JWD-D-24-00079.