Chapter 10 Sample Collection

The Sample Collection represents cervid tissue samples collected for CWD surveillance and includes properties that describe the animal from which the sample was taken, conditions and circumstances under which a sample was collected, and test results. The scope of the collection includes samples from all sources of mortality, including hunter-harvest programs, road kills, clinical suspect cases, captive cervids, and targeted removal efforts.

10.1 Understanding and using the Sample Collection

There are several key features and properties of the Sample Collection that users should understand to effectively use the Collection:

unique sample IDs
sample sources
samples with multiple tests
custom properties
samples associated with disease management areas (DMAs)
confidential status
season-year
location data
bulk data importation from a CSV file

10.1.1 Unique sample IDs

Sample ID is a required field in the Sample Collection. Sample ID must also be unique per Provider. It is recommended that the Provider uses the unique identifier, such as barcode, that the Provider normally uses to track samples and associate them with individual animals.

Although Sample ID is the only required field, other fields may be important for other functionality. It is important that Providers understand the Data, Models, and Visualizations of the CWD Data Warehouse to determine which data to include. For instance, a Visualization that depicts sample positivity rate by sub-administrative area is dependent upon the Sub-administrative area field and a test result. A model may be dependent on other fields such as the sample source or age/sex of the animal from which the sample was collected. See the Models and Visualization sections to determine if and how specific Sample Collection properties should be entered.

10.1.2 Sample source

Sample source is a required field that indicates the collection category, purpose, or origin of the sample, including hunter-harvest programs, road kills, clinical suspect cases, captive cervids, and targeted removal efforts.

10.1.3 Samples with multiple tests

The Sample Collection can accommodate situations in which a sample has been tested multiple times, such as when confirmatory testing is used.

If a sample has only one test, the Test status of the sample is based on the Result of that Test. For instance, a sample with a single test which has a result of Not Detected will be considered a CWD-negative sample when used in models or visualizations.

If a sample has more than one test, the Definitive property should be used to indicate which Test determines the Test status of the sample. Only a single test should be marked as Definitive.

If a sample has multiple Tests, but no Test has been marked as Definitive, then the Result from the Test with the most recent Date Submitted will determine the Test status of the sample. If submission dates are not provided, then the test with the most recent creation date (date when it was entered into the Warehouse) will determine the CWD status of the sample.

10.1.4 Custom properties

The Provider Administrator may add custom properties to the Sample Collection. This feature may be useful if a Provider collects additional data not accounted for in the standard Sample Collection schema. Custom properties are limited to string fields (up to 255 characters).

Custom properties are created in the Provider Administration Interface under the Sample tab. These properties will then be available and editable through the User Interface and through the API.

10.1.5 Samples associated with disease management areas (DMAs)

Through the Provider Administration Interface under the Sample tab, the Provider Administrator may define a list of acceptable values for the Disease Management Area property. The allowed values limit data entry for any new Samples entered through the User Interface (values appear as a drop-down list) or through the API.

10.1.6 Confidential status

In some situations, it may be necessary to prevent or limit access to data from a specific Sample. If a Sample’s Confidential property is set to TRUE (checkbox is checked), the Sample can be viewed or edited only by a Provider Administrator. Samples marked as confidential are excluded from all Models and Visualizations. For example, Confidential samples will be excluded from a Surveillance Activity visualization, which a wildlife agency may be using to track surveillance program activities during a surveillance period.

If a sample is given a Confidential status, the Confidential Reason property becomes available. Common reasons for making a sample confidential are given: Data quality, Duplicate sample, Not submitted for testing, and Sensitive.

A Provider Administrator may generate a list of Samples marked as confidential by applying a filter to the Sample Collection list view.

10.1.7 Season-year

Samples are collected throughout the year. However, for most Providers, the majority of samples are collected during hunting seasons, which run from late summer to early winter depending on the Provider. This sampling period typically spans calendar years (January 1). Therefore, Samples are aggregated by Season-year, which is the one-year period spanning from July 1 to June 30 of the following year. This grouping makes sense for several reasons. Providers typically plan their sampling effort around the hunting season, rather than the calendar year. For most wildlife agencies, the lowest sampling rates occur in the summer between June and July. July 1 also corresponds with most agencies’ fiscal years.

The Season-year is given as the four digits of the start year and final two digits of the end year. For example, the 2020-21 Season-year spans from July 1, 2020, through June 30, 2021.

10.1.8 Location data

The location where a sample is collected is important for disease surveillance. Although no location fields are required, many features and functions, such as Models and Visualizations, depend on the sub-administrative area for analysis and data summary.

Because the sub-administrative area concept is integrated throughout the Warehouse, the list of sub-administrative areas is set when a Provider joins the Warehouse. Sub-administrative areas are expected to be areas used for CWD surveillance (such as counties or wildlife management units) and are expected to remain constant over time, including their spatial extent, be non-overlapping, and cover the entire administrative area of the Provider.

An exact location where a Sample was collected can also be entered for a Sample. If the exact location is known, for instance as determined by GPS or as indicated by a hunter, it can be entered using the interactive map on the Sample edit dialog or by providing latitude and longitude values.

Alternatively, if the Provider uses a reference grid, such as the Public Land Survey System or a custom grid which the Provider has created specifically for CWD Surveillance, the sample location may be determined based on a reference grid code (a unique identifier). If a Provider uses a reference grid for sample locations, the Provider Administrator should contact the Warehouse Development Team and submit the grid as a CSV file. The file should include, at a minimum, three fields: Latitude, Longitude, and Grid Code. Reference grid codes and associated latitude/longitude locations are not accessible through the Warehouse User Interface. However, they can be retrieved through the API.

The precision of a sample location may be described using the Geolocation Precision field, which is an enumerated list that includes Exact, County, PLSS Section (Subdivision Level 1), and other predefined values. The sample location precision determines if and how the sample data can be used in Models and Visualizations. Review the specific Model and Visualization documentation to determine spatial precision requirements.

10.1.9 Bulk data importation from a CSV file

Providers that manage their sample data externally, such as in a separate database, can import sample data by uploading a comma-separated value (CSV) file. Bulk importation must be configured through the Provider Administration Interface and includes a data “normalization” process in which fields and values are mapped to those in the Sample Collection.

When a CSV file is uploaded previously imported samples can be updated and new samples can be added. The Sample ID field is used to determine if a sample already exists in the Sample Collection and will be updated, or if a new Sample should be created. Previously imported samples NOT in the current import are not affected. Therefore, Providers only need to import new and updated samples.

After the bulk importation and sample normalization processes are configured, direct sample data creation and editing through the User Interface is disabled, and all updates must be provided via the bulk importation process. This safety feature ensures the external database remains the master version of the data; all changes are made there and then transferred to the Warehouse.

Only a user with the Provider Administrator role can set up and configure the bulk importation and sample normalization processes. After the processes are set up, they can be run by users with either the Provider Administrator or the Sample Importer role.

10.2 Visualizations that directly use the Sample Collection

The Sample Collection may also be used in several Visualizations that describe the number of samples collected through surveillance activities or disease status of sub-administrative units.

10.3 Properties

An asterisk denotes a required field.

Animal Details

Field name	Definition	Allowed values
Sample ID*	A sample identification code; must be unique at the Provider level	any string
External ID	A second external id to uniquely identify the sample other than Sample ID; allows external API intersections to store database identifiers that can be used to align data in external systems; in the UI, this field is not editable and is only visible when it has a value; must be unique at the Provider level	any string
Species	The species from which the sample came	Axis deer, Black-tailed deer, Caribou, Eld’s deer, Elk, Fallow deer, Moose, Mule deer, Muntjac, Pere David’s deer, Red deer, Reindeer, Sambar deer, Sika, Tufted deer, White-tailed deer, Hybrid, Unknown
Sex	The sex of the deer	Female, Male, Unknown
Age group	The general age category of the deer sampled (Fawn: <1.5 years; Yearling: >=1.5 to <2.5 years; Adult: >=2.5 years)	Adult, Yearling, Fawn, No age
Sample source*	The collection category, purpose, or origin of the sample	Captive cervid facility, Clinical suspect, Hunter harvest, Illegal import, Removal for crop damage, Research, Road kill, Targeted removal, Unknown
Sub-administrative area	The sub-administrative area (county or equivalent) in which the sample originated	Provider sub-administrative areas
Confidential	Indicates whether a sample should not be delivered to models or visualizations	True, False

Location | Field name | Definition | Allowed values | |—|—|—| | Location reference grid code | For Providers that use a grid reference system for sample location, this field serves as a “lookup” and will populate the latitude and longitude fields. If latitude and longitude properties are already filled, they are assumed to be more precise and will not be updated if a grid code is provided. | For the lookup functionality to generate latitude and longitude values, the value entered must match an item in the list of grid codes provided to the CWD Data Warehouse Administrator (case-sensitive). | | Latitude | The latitude (in decimal degrees) where the animal died | Number in decimal format | | Longitude | The longitude (in decimal degrees) where the animal died | Number in decimal format | | Geolocation Source | The origin of the latitude and longitude values | GPS, Geocoded, Online Map, Submitter provided | | Geolocation Precision | A text description of the spatial precision of the provided location of the sample. | Exact, Town, County, PLSS Township, PLSS Section (Subdivision Level 1), PLSS Quarter Section (Subdivision Level 2), Not mappable, Street, Interpolated | | Disease management area | The surveillance region or zone as determined by the Provider. | Value list is configured in the Provider Administration Interface. | | Address (Street Address, City, Administrative Area, Postal Code) | Address associated with the sample; fields not used for geocoding ||

Hunter The Hunter sub-entity refers to an individual that harvested the animal from which the sample was taken.

Properties that could be used to collect personally identifiable information (PII), such as hunter name or contact information, are not provided for data privacy and security purposes.

Field name	Definition	Allowed values
Confirmation number	Unique number or code associated with a hunter-harvested animal; also commonly referred to as a deer tag or carcass tag number; for many Providers, this number is used by hunters to retrieve CWD testing results for harvested animals
License ID	The hunter’s license number or identification number

Collector The Collector sub-entity refers to the individual that collected the sample. For some Providers, this entity may be a contracted organization that collects roadkill carcasses or an individual that collects samples from hunters and delivers them to the Provider.

Note: The Collection Details section (described above) includes fields for a Meat Processor/Taxidermist and Cervid Facility which may be involved in the sample collection process. Therefore, the Collector section is intended only for other types of collectors and may be used as needed by the Provider.

Field name	Definition	Allowed values
Name	Full name of Collector
Email	Collector’s email address
Phone	Collector’s phone number
Address (Street Address, City, State, Postal Code)	Collector’s address
Animal ID	Unique number or code used by the Collector to identify the sample or animal
Source	Additional information from the Collector about how the sample was acquired
Collector ID	Unique number or code used to identify the Collector, such as a license or contract number

Test The Test section includes the results from one or more CWD tests of the sample.

Field name	Definition	Allowed values
Type	Type of test	IHC; ELISA
Tissue	Type of tissue tested	Lymph node, Obex
Result*	Test result	Detected, Inconclusive, Not Detected, Pending, Not tested
Lab	Name of the lab which conducted the test	text string
Test ID*	Test result identification number or code that uniquely identifies this test result for this sample, such as a lab accession number	text string
Date submitted	Date on which the sample was submitted for testing	MM/DD/YYYY
Date received	Date on which the test result was received by the Provider	MM/DD/YYYY
Status	Relevant comment on the status or outcome of the test	text string
Definitive	Indicates whether the test result is the one that determines the Test status of the sample; for example, a confirmatory test would be marked as Definitive	TRUE (checked), FALSE (unchecked)

Custom properties Samples may have one or more custom properties, which are defined by fields added in the Provider Administration Interface. Custom properties appear as a list of fields at the bottom of the Sample modal.

Field name	Definition	Allowed values
User-defined	User-defined	string

Hidden properties The Sample Collection includes fields that are not visible or editable through the User Interface. These fields are used internally to support Warehouse functionality.

Field name	Definition	Allowed values
Agency Management Unit (read only)	This property is populated by middleware based on the value of the _sub_administrative_area collection in conjunction with Provider configuration made on the Provider Administrative Interface Sub-administrative area tab	internal id
Season-year	The annual period during which the sample was collected (YYYY-YY, four digit year followed by subsequent two digit year). This property is read-only and calculated based on the values of Date_harvested, Date_sampled or the moment of creation. The only exception to this calculation rule is create/update via an API user. In this case, a valid Season-year value may be provided directly, overriding the calculation. This method allows legacy data to be explicitly created without the need to add potentially mis-leading dates to the data merely to get a sample properly categorized.
Raw Sample	The raw external representation of imported sample data
Raw Test	The raw external representation of imported test data
Test Status (read only)	The value of the selected or default definitive test result or whether there are no tests, or no tests with a value
Selected Definitive (read only)	A Test level property that identifies which Result to use in reporting and when delivering a Result to model executions for situations where no test has been flagged as “Definitive” or multiple tests have been flagged as “Definitive”.If no single test is flagged as “Definitive”, the one with the most recent Date Submitted will be used. If more than one test is flagged as “Definitive” then the most recent “Definitive” test, based upon Date submitted, will be used. If submission dates are not provided, then the test with the most recent creation data (date when it was entered into the Warehouse) will be used.	Boolean

10.3.1 Configuring bulk sample data importation

A Provider Administrator can configure the Provider for bulk sample data import through the Provider Administration Interface.

From the Provider Administration Interface, select the Sample Normalization tab and follow the instructions provided through the Sample normalization configuration wizard. The Sample Normalization can be reconfigured by repeating this wizard. If the Sample Normalization process is reconfigured, existing samples in the Sample Collection are re-normalized using the updated configuration.

Pre-normalized data (source data) are stored within the sample record and may be viewed in the Audit Log associated with each sample and through the CWD Data Warehouse API.

Sample and test data in one or two files The sample normalization process is designed to accommodate a variety of data structures. Sample and test data can be uploaded as a single combined CSV file or as separate CSV files. In the latter situation, samples and tests must be relatable using a shared identifier, such as a barcode.

Multiple tests per sample As described earlier, the Sample Collection can store multiple tests per sample. The sample normalization process is designed to handle this situation and can accommodate a variety of data structures.

If the uploaded sample data CSV file includes test data, multiple tests may be configured using different source fields. If the sample and test data are uploaded as separate CSV files, the test data may have more than one row corresponding to a single sample. Each test must have an ID (Test ID) that uniquely identifies that test for the sample with which it is associated.

10.3.2 Sample Normalization Wizard

10.3.2.1 Step 1. Overview

The Sample Normalization overview page provides a brief introduction to the wizard.

An option to delete existing sample data is given if the Sample Collection already includes samples. If the Sample data are deleted, they cannot be restored. A pop-up confirmation is provided to discourage accidental deletion. If all Sample Collection data are deleted, any existing Sample Normalization configuration remains in place.

If the Sample Normalization process has already been configured for the Provider, an option to delete the current configuration is given.

10.3.2.2 Step 2. Sample data ID

Upload a comma separated values (CSV) file containing representative Sample data. The CSV does not need to be large or contain all a Provider’s Sample data. However, it must include sufficient data to represent all possible enumerated values that a Provider collects, so they can be mapped onto the system’s schema.

Note: The data you upload must be real sample data and will be stored and used for the purpose of configuring the normalization process. The Review step (Step 6) will re-normalize the data you provide, so you can verify your configuration.

After uploading the CSV file, select the field that provides the unique ID for each sample. This unique ID will populate the Sample ID field in the Sample Collection.

When a CSV data is imported, the unique ID determines whether the sample already exists in the Sample Collection. If the sample already exists, it will be updated with the provided data. If the unique ID is not found, a new sample will be created.

After selecting the unique ID field, the Sample data will be imported. This process may take time as the Sample data are examined for the next steps in the normalization process. When the process is completed, the user is prompted to press the Continue button.

10.3.2.3 Step 3. Sample data normalization

The Sample data normalization page presents a list of fields in the Sample Collection. (Note that this is not a list of the fields in the CSV file.) Fields marked with a red ‘X’ icon have not yet been configured.

Each Sample Collection property must be configured, even if no Source field or default value will be provided. The first step is the selection of an Input Method – the method that will be used for deriving the Destination property from the Source field. Depending on the specific Destination property, the input methods may include: No value, Enumerated Value, and Direct Value. Several Destination properties, such as Sub-administrative area and Test, have unique input methods.

Input Methods No value This option is always shown and should be selected when the Sample Collection property is not represented in the CSV file data. An option to provide a default value for the Sample Collection property may be presented if logical for the Destination property.

Enumerated Value This option is shown when a Sample Collection property has a list of allowed values (e.g., Species). Select this option when a field in the CSV file corresponds to the property in the Sample Collection. When the Source field that contains the data is chosen, a list of unique values derived from the previously uploaded CSV file is presented, and corresponding values in the Destination property must be selected.

Direct Value For Sample Collection properties that contain free text, numeric, or date values, this option is presented. After selecting a Source field, additional configuration may be required, depending on the Sample Collection property. For instance, the Date harvested property requires a date format (such as MM/DD/YYYY) to translate the Source field data into the Destination property.

Sub-administrative areas Normalization of this property is performed by “exact ignore case matching” to sub-administrative area names. The data in the Source field must precisely match (including capitalization and abbreviations) the full name of the sub-administrative area (such as “Green County”) or an alias for that sub-administrative area. Aliases can be configured for each sub-administrative area in the Provider Administration Interface under the Sub-administrative area tab. Although the list of aliases should be set prior to reaching this step in the Sample Normalization Wizard, the user may save their progress at this point, edit that list, then return to this page.

Tests If test data will be uploaded as a separate CSV file and not part of the sample data CSV file, select “No value” as the input type and configure this property on the “Test data normalization” step. If “Test results” is selected (indicating the test data are included in the Sample data CSV file already uploaded), a secondary list of Destination properties is given. If the Sample data CSV file includes data from multiple tests per sample, select “Add another test” to configure normalization for the additional tests. This process can accommodate a variety of Source test data structures.

Season-year Season-year span the dates July 1 to June 30. For example, a date of January 3, 2021 would have a season year of 2020-21. In the Season-year property configuration, if you select “No value”, a value for Season-year will be calculated based upon the values of Date Harvested or Date Sampled (in that order). If you select “Direct value”, the value provided must match the format of YYYY-yy, where YYYY is a 4-digit year and yy is the 2-digit year immediately following YYYY (e.g., 2020-21). If an input value does not match that format, the sample will be rejected with an error.

10.3.2.4 Step 4. Test data ID

The sample normalization process can accommodate the situation in which test data are not included in the Sample data CSV file and instead are in a separate CSV file. If this situation does not apply, this portion of the Wizard can be skipped. However, if this situation applies, upload a CSV file containing representative test data. The CSV file does not need to be large or contain all of a Provider’s test data. However, it must contain sufficient data to represent all possible enumerated values that a Provider collects, so they can be mapped onto the system’s schema.

Note: The data you upload must be real test data and will be stored and used for the purpose of configuring the normalization process. The review step will re-normalize the data you provide so you can verify your configuration.

After uploading the CSV file, select the source field that contains the unique ID corresponding to Sample ID in the sample data.

If the test data contain multiple records corresponding to multiple tests per sample, select the unique test ID field, which provides the ID that identifies each test uniquely for each sample. For example, this may be a lab accession number. The unique test ID does not need to be unique across all samples. However, it must be unique per Sample ID. Combined, the unique Sample ID (identifying each sample) and the unique test ID identify unique tests across all samples.

If you choose to include a unique test ID, any test data without a unique test ID will be ignored and will not be imported.

When a test data CSV file is imported, the unique IDs for the sample and test will be used together to determine whether the test already exists in the Sample Collection. If the test already exists, it will be updated with the provided data. If an existing test is not found, a new test will be created.

After selecting the unique ID field for the Sample ID and optionally a unique test ID field, the test data CSV file will be imported. This process may take time as the data are examined for the next steps in the normalization process. When the process is completed, the user is prompted to press the Continue button.

10.3.2.5 Step 5. Test data normalization

As on the sample data normalization page, the test data normalization page presents a list of properties that must be configured. Again, note that this is not a list of the fields in the CSV file. Fields marked with a red ‘X’ icon have not yet been configured. All fields must be configured, even if no Source field or default value will be provided in the CSV file.

Each property has a unique configuration process, and each process starts with the selection of an Input Method – the method that will be used for deriving the Destination property from the Source field. Depending on the specific Destination property, the input methods may include No value, Enumerated Value, and Direct Value.

If the test data CSV file includes data from multiple tests per row, select “Add another test” to configure normalization for the additional test(s). If the test data CSV file includes unique tests on separate rows, each test will be imported separately, and multiple tests will be associated with corresponding samples.

10.3.2.6 Step 6. Review

The sample normalization has been configured and the standardized data is presented for review. This process may take some time if a large dataset has been provided. An option to download the standardized data is provided.

A pop-up is presented that indicates the number of records created, number of errors, and number of records ignored. Records are ignored if they already exist in the Sample Collection and have not been updated.

The user may also review the Import Log found in the Administrative section of the Navigation Menu to further review the import process and ensure that it is free from errors.

10.4 Bulk Importing Data after Configuring Sample Normalization

The configuration of the bulk importation and sample normalization process inactivates sample data creation and management through the User Interface.

To add new samples and/or update existing samples, the Add Data button in the Sample Collection Interface is replaced with an Import Sample+Test Data button. Clicking this button will display a pop-up allowing the User to upload a sample data CSV file and optionally a test data CSV file (if the sample normalization has been configured for that).

Importation uses an “upsert” process in which previously imported samples will be updated and new samples will be added. The Sample ID field (which is required and must be unique for all Provider samples) is used to determine if a sample already exists in the Sample Collection. Previously imported samples NOT in the current import are not affected and are not deleted. Therefore, Providers only need to import new and updated samples.

10.5 Import Errors

Errors can occur during importation for a variety of reasons, such as:

a sample with no Sample ID
a field with a value not accounted for in the sample normalization configuration
a missing field in the CSV file

If an import produces errors, the import is canceled. No sample data are imported and existing sample data are not affected. The first 10 errors encountered are logged. These errors may be reviewed in the Import log in the Administrative section of the Navigation Menu.

10.6 Import log

The Import log can be found in the Administrative section of the main menu and is accessible to users with the Provider Administrator or Sample Importer role. The Import log view provides a list of previously run imports by date and includes the number of records created, updated, not changed, or ignored, as well as the number of errors.

Opening a log entry will display a categorized list of affected records and will allow the user to “drill down” to individual samples. For newly created samples, the log will provide the source data and the normalized data. For updated samples, the log provides a list of changes, including the before and after import values. If an error was encountered, the source data and a brief error message are provided.

10.7 Sample Normalization and the API

For the Sample Collection, the CWD Data Warehouse API provides two special routes that make use of the sample normalization configuration if it has been created by the Provider:

POST - /api/sample/import
POST - /api/sample/upsert

While the standard API routes for the Sample Collection require data to conform to the Sample Collection schema, these additional special routes allow a Provider to use source data that does not conform to the Sample Collection and will be normalized upon import. Effectively, these special API routes allow the Provider to execute an import through the API.