BMI Tools
 

Best practices for REDCap setup for export to statistical packages

Best practices for REDCap setup for export to statistical packages

Compiled by the CTSI BERD Program / Last updated: November 11, 2015

The following practices ensure that the data will be easy to work with once exported to a statistical package. Our experience comes mostly from SAS exports, but most of these will make R, Stata, or SPSS users happier as well.

Use an appropriate database design.

  • Use a longitudinal database for collecting the same variables multiple times
  • If there are only few repeated measures in a cross-sectional database, make the variable names and labels consistent (eg sbp1, sbp3, sbp6 for systolic blood pressure measured at 1, 3, and 6 months)

Use short (up to about 20 characters) variable names that are meaningful.

  • The analyst has to retype the variable name many times during the analysis
  • SAS variable names must be no more than 32 characters long
  • A “checkbox” question creates multiple variables where at least three characters are added to the original variable name
  • do NOT use the autonaming feature – it generates overly long names
  • Consider using a prefix to group similar variables. Example: Diagnosis_HTN, Diagnosis_COPD, Diagnosis_DM2
  • If you have many variables consider using single name with a numeric postfix for series of related variables (ie: Q01, Q02,…, Q79, Q80. Dose1-Dose9, etc.). Use descriptive labels to provide a meaningful description.

Do NOT use very long question labels.

  • These labels will be used in tables / figures everywhere the variable is referenced, and make the report difficult to read if too long
  • SAS has a limit of 256 characters for labels, and they will be truncated beyond that. One line of text is about 80 characters.
  • A “checkbox” question will add the actual choice at the end of the label as “(Choice = ….)”, which can lengthen the label substantially, especially if the choices are long. Again, SAS will truncate the part beyond 256 characters, potentially losing the actual choice from the label.
  • Avoid special characters (no-keyboard characters, html formatting) in question labels – they might not show up well in a report that uses a different format.

Use short (up to about 20 characters) form names that are meaningful.

  • An automatic variable indicating the completion of the form is created from the form name with the string “_complete” appended. The total name has to fit in the variable name limit.

Choose the appropriate field type for each variable.

  • Set appropriate validation as number, date, etc when applicable – the statistical software needs to be able to separate character variables from numeric ones, and know the format of the dates
  • Avoid free-text and note fields for data that needs to be analyzed.
  • Avoid multiple-answer checkboxes unless multiple answers are possible – each checkbox is essentially a separate question, often complicating the analysis.
  • Consider radio buttons or a drop-down list as an alternative to free text and multiple answer checkboxes
  • Do NOT have very long text for the choices – they will show up in the variable labels and in tables



NIH Funding Acknowledgment: Important Reminder – Please acknowledge the NIH when publishing papers, patents, projects, and presentations resulting from the use of CTSI resources by including the NIH Funding Acknowledgement.

PARTNERS

Children's Hospital of WisconsinMarquette UniversityMSOEUWMVersitiVA Medical Center