Data standardization
Table of Contents
The Data Standardization allow users to more easily manage the export files and conduct a more effective analysis of variables within the same category. The domain-based export 'regroups' variables by visit (including unscheduled visits and surveys) and domain. It is available for CSV and SAS exports.
Defining domains
The data standardization page is available under the Study Design navigation item. On the Data standardization page, you can see the domains added to your study, and add, edit, and delete domains.
To add a new domain:
- Navigate to the ‘Study design’ section.
- Then to the ‘Data standardization’ subtab
- Select the ‘+Add’ button.
- In a new pop-up window enter the domain name and abbreviation
- Select the ‘Add’ button.
To delete or edit a domain select a relevant option from the three dots menu, next to the domain name.
When you edit a domain, you can assign fields to a domain (e.g., dem_height, and vis1_height).
After selecting fields, you can add domain-specific variable names for them (e.g., height for both fields) - in the first right-hand column.
Domain-specific variable names have a maximum length of 32 characters, are lowercase, and can only contain alphanumeric characters and the character ‘_’.
An overview of the mappings between fields and domains can also be exported from the page. The file contains the domain abbreviation, domain name, domain variable name, parent type (visit/repeating data/survey), parent's name, and field information.
If there are assigned visit numbers to Visits on the Study Structure page, this visit number will be used in the ‘Visit number’ column in the exports that are grouped by domain.
Grouping data by domain when exporting
When you export Participant data, you have an option to group data by domain. This option is available for CSV and SAS file type. If the CSV or SAS file type is selected, an option is presented to ‘Do not group data’ (default) or ‘Group data by domain’.
When data is grouped by domain, the exported file format follows the following structure, with one line per Visit/Repeating Data instance/Survey instance:
- Participant Id
- Participant Status
- Site Abbreviation
- Randomization Id
- Randomization Group
- Randomized On
- Participant Creation Date
- Visit name
- Visit number (as set on the Study Structure page)
- Type (Visit, Repeating data, Survey)
- Name (of the Visit, Repeating data, Survey)
- All dates (field values & metadata) are exported in YYYY-MM-DD hh:mm (date and time), or YYYY-MM-DD (date) format.
- Checkboxes are exported in the [domain variable name]_[option name] format, with a value of 1 representing a checked option, and 0 representing an unchecked option.
- Number and date fields are exported in the [domain variable name]_number and [domain variable name]_date format.
- Grid fields are exported in the [domain variable name]_[row name]_[column name] format.
- Row and column names are cut off at 15 characters.
- Data marked as missing is handled the same way as our other exports.
- Form blinding permissions are taken into account while exporting data. In case the user is blinded, the related cells are empty in the export.
- View randomization permissions are taken into account while exporting data. Only randomization information from sites where a user has View randomization permissions for, are included in the export.
- In case variable names are generated, they are limited to 64 characters.
- A field variable list is added per domain ([domain abbreviation]_variablelist.[filetype]).
- If the user does not have decrypt permissions for the site the participant is assigned to, the encrypted value will be exported as *encrypted*.