11 Nov Best practices for REDCap setup for export to statistical packages
Compiled by the CTSI BERD Program / Last updated: November 11, 2015
The following practices ensure that the data will be easy to work with once exported to a statistical package. Our experience comes mostly from SAS exports, but most of these will make R, Stata, or SPSS users happier as well.
Use an appropriate database design.
- Use a longitudinal database for collecting the same variables multiple times
- If there are only few repeated measures in a cross-sectional database, make the variable names and labels consistent (eg sbp1, sbp3, sbp6 for systolic blood pressure measured at 1, 3, and 6 months)
Use short (up to about 20 characters) variable names that are meaningful.
- The analyst has to retype the variable name many times during the analysis
- SAS variable names must be no more than 32 characters long
- A “checkbox” question creates multiple variables where at least three characters are added to the original variable name
- do NOT use the autonaming feature – it generates overly long names
- Consider using a prefix to group similar variables. Example: Diagnosis_HTN, Diagnosis_COPD, Diagnosis_DM2
- If you have many variables consider using single name with a numeric postfix for series of related variables (ie: Q01, Q02,…, Q79, Q80. Dose1-Dose9, etc.). Use descriptive labels to provide a meaningful description.
Do NOT use very long question labels.
- These labels will be used in tables / figures everywhere the variable is referenced, and make the report difficult to read if too long
- SAS has a limit of 256 characters for labels, and they will be truncated beyond that. One line of text is about 80 characters.
- A “checkbox” question will add the actual choice at the end of the label as “(Choice = ….)”, which can lengthen the label substantially, especially if the choices are long. Again, SAS will truncate the part beyond 256 characters, potentially losing the actual choice from the label.
- Avoid special characters (no-keyboard characters, html formatting) in question labels – they might not show up well in a report that uses a different format.
Use short (up to about 20 characters) form names that are meaningful.
- An automatic variable indicating the completion of the form is created from the form name with the string “_complete” appended. The total name has to fit in the variable name limit.
Choose the appropriate field type for each variable.
- Set appropriate validation as number, date, etc when applicable – the statistical software needs to be able to separate character variables from numeric ones, and know the format of the dates
- Avoid free-text and note fields for data that needs to be analyzed.
- Avoid multiple-answer checkboxes unless multiple answers are possible – each checkbox is essentially a separate question, often complicating the analysis.
- Consider radio buttons or a drop-down list as an alternative to free text and multiple answer checkboxes
- Do NOT have very long text for the choices – they will show up in the variable labels and in tables