Best practices for REDCap setup for export to statistical packages

Best practices for REDCap setup for export to statistical packages

Compiled by the CTSI BERD Program / Last updated: November 11, 2015

The following practices ensure that the data will be easy to work with once exported to a statistical package. Our experience comes mostly from SAS exports, but most of these will make R, Stata, or SPSS users happier as well.

Use an appropriate database design.

  • Use a longitudinal database for collecting the same variables multiple times
  • If there are only few repeated measures in a cross-sectional database, make the variable names and labels consistent (eg sbp1, sbp3, sbp6 for systolic blood pressure measured at 1, 3, and 6 months)

Use short (up to about 20 characters) variable names that are meaningful.

  • The analyst has to retype the variable name many times during the analysis
  • SAS variable names must be no more than 32 characters long
  • A “checkbox” question creates multiple variables where at least three characters are added to the original variable name
  • do NOT use the autonaming feature – it generates overly long names
  • Consider using a prefix to group similar variables. Example: Diagnosis_HTN, Diagnosis_COPD, Diagnosis_DM2
  • If you have many variables consider using single name with a numeric postfix for series of related variables (ie: Q01, Q02,…, Q79, Q80. Dose1-Dose9, etc.). Use descriptive labels to provide a meaningful description.

Do NOT use very long question labels.

  • These labels will be used in tables / figures everywhere the variable is referenced, and make the report difficult to read if too long
  • SAS has a limit of 256 characters for labels, and they will be truncated beyond that. One line of text is about 80 characters.
  • A “checkbox” question will add the actual choice at the end of the label as “(Choice = ….)”, which can lengthen the label substantially, especially if the choices are long. Again, SAS will truncate the part beyond 256 characters, potentially losing the actual choice from the label.
  • Avoid special characters (no-keyboard characters, html formatting) in question labels – they might not show up well in a report that uses a different format.

Use short (up to about 20 characters) form names that are meaningful.

  • An automatic variable indicating the completion of the form is created from the form name with the string “_complete” appended. The total name has to fit in the variable name limit.

Choose the appropriate field type for each variable.

  • Set appropriate validation as number, date, etc when applicable – the statistical software needs to be able to separate character variables from numeric ones, and know the format of the dates
  • Avoid free-text and note fields for data that needs to be analyzed.
  • Avoid multiple-answer checkboxes unless multiple answers are possible – each checkbox is essentially a separate question, often complicating the analysis.
  • Consider radio buttons or a drop-down list as an alternative to free text and multiple answer checkboxes
  • Do NOT have very long text for the choices – they will show up in the variable labels and in tables