Chapter 2 What is the CWD Data Warehouse?
2.1 Wildlife agencies are faced with a challenge
Chronic wasting disease (CWD) is a fatal disease of white-tailed deer and other cervid species. Since 1967 when the disease was first discovered in captive mule deer in Colorado, CWD has spread across North America. Preventing further spread of the disease and controlling the disease after introduction have proven extremely difficult.
Disease surveillance, including efforts to detect new introductions quickly and measure changes in disease prevalence in areas where it exists, is an essential component of the disease response plans enacted by wildlife agencies across the country. But where should wildlife agencies focus their efforts? How much sampling is enough to determine if an area is ‘free of disease’ with confidence or to determine if the prevalence is increasing or decreasing? What does an efficient and effective disease surveillance plan look like?
The CWD Data Warehouse was built to provide wildlife agencies with the quantitative tools necessary to address these questions. The Warehouse supports the development of efficient and effective plans based on the best available science through the lens of mathematics and data science. The Warehouse can also serve as the foundation of a data management system for agencies that need a reliable and efficient solution for their surveillance data management and processing.
2.2 A coalition of wildlife agencies
Given that wildlife agencies throughout North America are asking similar questions and exploring similar solutions, it makes sense to face those challenges together by sharing experiences, data, and tools.
The CWD Data Warehouse is a product of the Surveillance Optimization Project for Chronic Wasting Disease (SOP4CWD). SOP4CWD has been funded by the Michigan Department of Natural Resources and the New York State Department of Environmental Conservation and is jointly led by the Cornell Wildlife Health Lab and the Boone and Crockett Quantitative Wildlife Center at Michigan State University. USGS contributes to the project as a research partner. More information about the project can be found at the SOP4CWD project page on the CWHL website.
The CWD Data Warehouse project is led by the Cornell Wildlife Health Lab. The application development team at DJ Case has managed the technical development of the system.
2.3 Warehouse Concepts
2.3.1 Data - Models - Visualizations
The CWD Data Warehouse is designed around a conceptual model with three components: data, models, and visualizations.
The Data component includes the datasets (or “collections”) stored in the Warehouse. Currently, there are six collections, including cervid samples and CWD test results, as well as ancillary datasets relevant to CWD surveillance and management. The Data collections serve as inputs for the Models component, a collection of mathematical models and analyses that can be parameterized and executed by the users. The Visualizations component provides data visualization, reporting, and dashboarding functionality. Visualizations use the Data and Models as inputs and present them as data visualization elements, such as interactive maps, graphs, and tabular summaries.
A common workflow starts with data entry followed by model execution which is then followed by data visualization for the purpose of model exploration and/or data presentation.
2.3.2 The Season-year
Season-year is a concept used throughout the Warehouse in data collections, models, and visualizations. It refers to the annual period from July 1 to June 30. For most wildlife agencies, CWD surveillance does not begin on January 1. Instead, CWD surveillance activities are planned annually, typically in the Spring or Summer, and are focused on sample collection during hunting seasons which may extend from September through January or later. Therefore, in the Warehouse, data are aggregated by Season-year. Models and visualizations are also based on Season-year, rather than calendar year. In the Warehouse, Season-year is written as the four digits of the first year, followed by the last two digits of the second (e.g., 2020-21).
2.3.3 Data Sharing
Interagency data sharing is a principle aim of the Warehouse design. Sharing data and information across jurisdictional boundaries is critical for situational awareness and collectively developing a deeper understanding of CWD to support better data-driven decisions.
Data sharing is carefully and securely managed through systemwide user-level data access restrictions. All CWD Data Warehouse partners and their users are required to understand and adhere to the data use policy, as well as each individual wildlife agency’s own data use and sharing policies.
In the Warehouse, all wildlife agency partners can access generalized and aggregate data that are considered important for CWD surveillance planning and situational awareness. However, access to sensitive data is restricted. The Warehouse does not contain any personally identifiable information.