Chapter 24 Disease Cluster Analysis Data Export
24.1 What is the Disease Cluster Analysis Data Export?
The Disease Cluster Analysis Data Export is a linkage in the CWD Data Warehouse to facilitate data export to the external SaTScan™ software. The SaTScan™ software is an independent tool used to assess whether a cluster of CWD-positive tests is statistically significant in time or space. The SaTScan™ software was developed under the joint auspices of (i) Martin Kulldorff, (ii) the National Cancer Institute, and (iii) the New York City Department of Health and Mental Hygiene.
24.2 What Questions Does it Answer?
Question 1. How can I use the SaTScan™ software with the CWD Data Warehouse? The Disease Cluster Analysis Data Export links CWD Warehouse Data with the SaTScan™ software so that managers using the CWD Data Warehouse do not need to reprocess their surveillance data.
Question 2. Is this cluster of positive cases statistically significant? The SaTScan™ software is used after sampling when positives are found to determine whether the cluster is statistically different than other clusters.
Question 3. Is this cluster of positive cases geographically significant? The SaTScan™ software is used after sampling when positives are found to determine whether disease is randomly distributed over time or space.
24.3 Output Details
24.3.1 Outputs in the CWD Data Warehouse:
Case_File.csv Containing all the CWD-positive records, ready for immediate upload into the ‘Case File’ box of the Input Tab of the SaTScan™ software.
Control_File.csv Containing all the CWD-non-detect records, ready for immediate upload into the ‘Control File’ box of the Input Tab of the SaTScan™ software.
Coordinates_File.csv Containing all the latitude and longitude locations for the CWD-positive and CWD-non detect records, ready for immediate upload into the ‘Coordinates File’ box of the Input Tab of the SaTScan™ software.
SaTScan_user_inputs.csv Containing parameter details of the data for initializing the SaTScan™ software.
All_SaTScan_Data For your records, which is a combined file of case, control, and coordinate information.
24.3.1.1 Uploading the case, control, and coordinates files into SaTScan™:
- Case File:
- On the Input Tab of the SaTScan™ software, click the box with the three ellipses, then navigate to the Case_File.csv file on your machine.
- A box of information will appear; click Next.
- In the Sample of the File Content window; ignore first 0 rows; click ‘First row is column name’; comma field separator; and double quote indicator; Click Next.
- In the Import SaTScan™ Variables for analysis box, click “Bernoulli Model”. Below in the Source File Variable, click unassigned, and assign Location or Identifier to Identifier, Number of Cases to Number.of.Cases, and Date/Time to Year or Date; Click Next.
- An Optional window will open; Click Next again.
- Click Import.
- Control File:
- On the Input Tab of the SaTScan™ software, click the box with the three ellipses, then navigate to the Control_File.csv file on your machine.
- A box of information will appear; click Next.
- In the Sample of the File Content window; ignore first 0 rows; click ‘First row is column name’; comma field separator; and double quote indicator; Click Next.
- “Bernoulli Model” should be grayed out – this is good. Below in the Source File Variable, click unassigned, and assign Location or Identifier to Identifier, Number of Controls to Number.of.Controls, and Date/Time to Year or Date (whatever you chose for the Case File (above); Click Next.
- Click Import.
- Coordinates File:
- On the Input Tab of the SaTScan™ software, click the box with the three ellipses, then navigate to the Coordinates_File.csv file on your machine.
- A box of information will appear; click Next.
- In the Sample of the File Content window; ignore first 0 rows; click ‘First row is column name’; comma field separator; and double quote indicator; Click Next.
- “Latitude/Longitude Coordinates” should be selected. Below in the Source File Variable, click unassigned, and assign Location to Identifier, Latitude (y-axis) to Latitude, and Longitude (x-axis) to Longitude; Click Next.
- Click Import.
24.3.1.2 Other Parameter Settings:
- On the Input Tab of the SaTScan™ software, set the Time Precision the same as in the case and control files.
- Use the SaTScan_user_inputs.csv file as reference to set the study period.
- Set the Coordinates as Lat/Long.
- On the Analysis Tab of the SaTScan™ software, set the Type of Analysis for Retrospective Analysis to be Purely Spatial, Purely Temporal, or Space-Time.
- Make sure “Bernoulli” is selected as the Probability Model.
- Select Scan for Areas with High Rates.
- On the Output Tab of the SaTScan™ software, point to a pre-initialized text file on your machine.
- Click the large green arrow in the top ribbon of the SaTScan™ software to run the scan.
24.3.2 Outputs in the SaTScan™ software:
Many. See the explanation in the SaTScan™ user guide (https://www.satscan.org/techdoc.html).
24.4 Getting the Most out of this Model
- Run the Disease Cluster Analysis Data Export to prepare the surveillance data from the CWD Data Warehouse.
- Download the prepared files from the attachments column of the model execution page of the CWD Data Warehouse.
- Use your internet browser to access the external SaTScan™ software located at https://www.satscan.org/.
- Use the SaTScan™ import wizard to upload the prepared files into the SaTScan™ external software (see above).
- Follow SaTScan™ User Guide (https://www.satscan.org/techdoc.html) to use the SaTScan™ software.
24.5 Parameters Needed to Execute the Model
Model type: Select ‘Disease Cluster Analysis Data Export’ from the drop-down list.
Reference name: Label the execution.
(Optional) Applicable season-year: Label the season-year.
(Optional) Notes: Enter any additional remarks about the execution.
Species: Select the species of interest.
Season-year: Select one or more season-years of interest.
Scan type: Choose the scan statistic:
Discrete Bernoulli - Spatial (Exact Location): Use with binary (0/1) response data, where 0 represents CWD-negative and 1 represents CWD-positive. This scan statistic requires exact location data.
Discrete Bernoulli - Temporal: Use with binary (0/1) response data, where 0 represents CWD-negative and 1 represents CWD-positive. This scan statistic requires data to be reported with exact location and date.
24.6 Details on the Theory
Kulldorff M. A spatial scan statistic. Communications in Statistics: Theory and Methods, 1997; 26:1481-1496.
Kulldorff M, Nagarwalla N. Spatial disease clusters: Detection and Inference. Statistics in Medicine, 1995; 14:799-810.
Kulldorff M, Athas W, Feuer E, Miller B, Key C. Evaluating cluster alarms: A space-time scan statistic and brain cancer in Los Alamos. American Journal of Public Health, 1998; 88:1377-1380.
24.7 Code
The code is at https://github.com/Cornell-Wildlife-Health-Lab/disease-cluster-analysis-data-export-v2.