Data Quality Checks
Several data quality checks are available to detect data quality issues in loaded data, allowing admins to take corrective action to keep customer data accurate, complete, and reliable:
- Duplicate IDs
- Referential Integrity
- Outliers
- Control File Validation
- Missing Values
- Validate Column for a specific pattern
Admins can define a data quality check rule by adding a few parameters. Data quality check results are returned in real time. A context columns parameter allows admins to get more insight while previewing data quality check violations.
Rules and rule groups can be scheduled to run, or added as a job in the job orchestration flow.
Duplicate IDs
The Duplicate IDs check identifies duplicate values in a table column selected by the Admin using a Duplicate Check Quality Template. Admins use a Named Query to enter values for the table object and table key columns. This check uncovers duplicate violations in dimensions and facts, which is not covered in Nitro standard loading validation.
The Duplicate Check Quality Template includes the following features:
- More than one column can be selected for the Key Column parameter
- Users can specify a filter condition on the table object
- The key column result displays the number of duplicate rows found
SQL scripts inside quality check templates are flexible and scalable. The template accommodates multiple columns passed through one parameter, and the script dynamically adapts, keeping the logic intact. This can be used when running a duplicate check on more than one column as key, or when creating partition keys from more than one column.
To use the Duplicate IDs check:
- Log into the NAC.
- Select Data Quality from the Data Insights menu.
- Select the Rules Catalog tab.
- Select the New Quality Check button.
- Enter or select the following values:
- Display Name
- Description
- Quality Check Template - select Check Duplicates
- Connector
- Table
- Key Column
- Additional Column(s) for Select Clause
- Select Save.
- Select Run. Results are displayed below the Data Quality Check Details.
The Referential Integrity check finds records in the child table that have attribute values not present in the parent table. Admins use the Named Query template to enter values for the parent table object, parent table join column, child table object, and child table join column parameters.
Referential Integrity
To use the Referential Integrity check:
- Log into the NAC.
- Select Data Quality from the Data Insights menu.
- Select the Rules Catalog tab.
- Select the New Quality Check button.
- Enter or select the following values:
- Display Name
- Description
- Quality Check Template - select Check Referential Integrity
- Connector
- Table
- Key Column
- Additional Column(s) for Select Clause
- Select Save.
- Select Run. Results are displayed below the Data Quality Check Details.
Outliers
The Outliers data validation check finds values in a column that fall outside of a range calculated for the user by using statistical measures on the historical values for that column. If the value falls outside the calculated range, it is flagged as an outlier. This check can be used to determine if weekly/monthly forecast sales data from sales vendors are normal for a particular brand, product, or market combination.
To use the Outliers check:
- Log into the NAC.
- Select Data Quality from the Data Insights menu.
- Select the Rules Catalog tab.
- Select the New Quality Check button.
- Enter or select the following values:
- Display Name
- Description
- Quality Check Template - select Check Outliers
- Alert Threshold (optional)
- Connector
- Table
- Outlier Column
- Partitioning Column
- Date field to be filtered for segregating incremental data
- Time window in days
- Table Filter Condition (optional)
- Select Save.
- Select Run. Results are displayed below the Data Quality Check Details.
Control File Validation
The control file validation check ensures the summed value of the metric columns grouped over the common dimension columns in the transaction table matches the values in the metric columns of the control table. This ensures every transactional record is accounted for in the appropriate dimension, and rounding errors do not occur.
To use the Control File Validation check:
- Log into the NAC.
- Select Data Quality from the Data Insights menu.
- Select the Rules Catalog tab.
- Select the New Quality Check button.
- Enter or select the following values:
- Display Name
- Description
- Quality Check Template - select Control File Validation
- Alert Setting
- Control Table Parameters
- Transaction Table Parameters
- Select Save.
- Select Run. Results are displayed below the Data Quality Check Details.
Missing Values
The Data Quality Check can be used to detect missing (NULL) values in a column that exceed a specified percentage. This helps detect issues where data needs to be fixed and resubmitted in a short amount of time.
For example, a customer might require all fields in an uploaded table have a non-NULL value because it contains critical order information. Any records that violate this requirement are detected so the data can be fixed immediately and resubmitted.
To create a new data quality rule:
-
Log into the NAC.
-
Select Data Quality from the Data Insights menu.
-
Select the Rules Catalog tab.
-
Select the New Quality Check button.
-
Complete the following fields in the Data Quality Check Details section:
- Display Name
- Name
- Description
- Quality Check Template - select Check Missing Values
- Alert Threshold (optional)
- Select the following the following values in the Parameters section:
- Connector
- Table
- Validate Column(s) - columns on which to run a missing value check
- Missing Value Percentage cutoff - Enter a percentage tolerance for empty values
- Table Filter Condition (optional)
- Additional column(s) for Select clause (optional)
- Select Preview or Save.
Validate Column for a specific pattern
The Validate Column for a specific pattern allows admins to perform pattern validation checks on a specific table column to verify that data follows a prescribed pattern. For example, Nitro can ensure phone numbers, zip-codes, email addresses, and composite key validations are formatted correctly.
To use the Validate Column for a specific pattern check:
-
Log into the NAC.
-
Select Data Quality from the Data Insights menu.
-
Select the Rules Catalog tab.
-
Select the New Quality Check button.
-
Enter or select the following values:
- Display Name
- Description
- Quality Check Template - select Validate Column for a specific pattern
- Alert Threshold
- Connector
- Table
- Validate Column
- Column Pattern Values
- Table Filter Condition
- Additional Column(s) for Select Clause
- Select Preview or Save.