Custom Connectors

Custom connectors contain custom schema and database objects. Like industry data connectors, these operate over SFTP.

Creating the Metadata Package

The metadata package defines the structure of the data. The package includes three .yml metadata files:

  • FTP yml - describes the format of the CSV file being loaded, number of columns, data types, restrictions for each. This is used during the data validation process of Nitro data loads.
  • STG yml - describes the format of the table to be used to support the data as it is loaded in Redshift, and the load pattern into the staging layer
  • ODS yml - describes the long term format of the table in Nitro and the load pattern of the data as it is loaded into the ODS layer

The .yml files define the structure of the data loaded into the connector.

To create the metadata package:

  1. Create a top-level connector folder to hold all of the metadata driving the structure and behavior of the connector. This is the repository of metadata for the connector.
  2. Ensure to specify a namespace and leverage it throughout. For example: hr_employee_master__c
  3. Create a sub-directory within this folder named Tables.
  4. Create three .yml files in the Tables folder for each file loaded as a part of this connector. For example:
    Data filename: hr_employee_master_extract_01012020.csv
    • FTP .yml filename: hr_employee_master_FTP__c.yml
    • STG .yml filename: hr_employee_master_STG__c.yml
    • ODS .yml filename: hr_employee_master_ODS__c.yml
  5. Specify the following in the FTP.yml file:
    • name - name of the object in metadata
    • enabled - Enter a value of true or false. If true, the table is active. If false, the table is inactive.
    • csvDelimiter - The delimiter used in the raw data surrounded by double quotes
    • skipHeader - If headers are included, enter a value of true. Otherwise, enter false.
    • textQualifier - (defaults to Double Quote ") – indicates why this type of character is used to encode a single attribute text string; for example, if one needed to use a double quote for the following string <003,"A Double Randomized Placebo Controlled Study", 'Cholecap'>, and then the field within supports commas and new line characters.
  6. columns:
    • name - Name of the field to be used in the database
    • notNull - Enter true for required, false for not required
    • type - Set to the field type. Supported field types are those supported in Amazon RedShift.
    • columnBehavior: Can have 1 of 3 values; cannot be changed if yml contains a formula field
    • PRESENT - Column is in source data and in redshift table
    • NOT_PRESENT - Column is not in source data and is not in redshift table
    • PRESENT_NOT_POPULATED (ods table only) - Column is not in source data and column is required in ods for other processes such as star load
    • colOrder
    • Columns are ordered ascending numerically. If colOrder: is not populated, it follows columns with values in the order of the columns on the yml
Copy
---
name: sample_FTP__c
skipHeader: true
textQualifier: "
csvDelimiter: |
columns:
- name: sample_id__c
  type: varchar(16)
  identity: false
  notNull: true
- name: samplename__c
  type: varchar(100)
- name: transaction_date__c
  type: date
- name: price__c
  type: decimal(12,2)
schema: FTP

For the STG.yml file:

  1. Copy and paste the contents of the FTP.yml into the STG .yml file.
  1. Specify the following in the STG.yml file:
  • name - name of the object in metadata. This must match the filename.
  • schema (Required) - Must be STG for staging schema objects. This is the name of the schema where the table object is being loaded.
  • primaryKey - (optional) identifies which of the table attributes is used as the primary key, which is required for ODS load processes like "Upsert"
  • loadPattern - (optional) if using the jb_ftp_intelligentload__v job, this identifies which FTP pattern is used to load the data into the STG table
  • rollingDateKey - (optional) if using jb_ftp_intelligentload__v job, AND using the pt_ftp_rolling_load__v load pattern, this field represents the date field in the staging table used for the rolling load window.
  • columns - This should be a copy from what was done in the FTP .yml
  • type - (required) - Redshift data type / length (e.g varchar(255), numeric(12,2), boolean, datetime, date).
  • nonNull - (default false) - indicates that the column may not be empty, which is useful if the attribute will be a Primary Key.
  • default - (default empty) - field value will be accepted if no value is provided by the CSV. (e.g. might use 'Japan' to label all new country fields with the Japan location)
  • compression - (default ZSTD) - compression technique from Redshift isused to physically store the field on Redshift drives.
  • formula - (optional) - Used to identify the special formula used to add derived value to the field – must either be one of the Nitro resolved examples below, or a valid Redshift constant function (e.g. sysdate)

Composite key derived fields might look like < vFormula.concat('|', account_id__c, transaction_date__c, product_id__v) > See Configurable Composite Key for more details.

For the ODS.yml file:

  1. Copy and paste the contents of the STG.yml into the ODS .yml file.
  2. Specify the following in the ODS.yml file:
  • name - (required) - Name of the table object used to refer to the object in other metadata code. For example "sample_ODS__c" conforms to the naming standards for a custom object in the ODS schema.
  • tableName - (required) - name of the physical table created in the ODS schema - should be lowercase, for example "sample_ods__c".
  • schema - (required) - Must be ODS for ODS schema objects. Name of the schema where the table object is being loaded.
  • primaryKey - (optional) - identifies which of the table attributes is used as the primary key, and which is required for ODS load processes like "Upsert".
  • loadPattern - (optional) - if using the jb_ftp_intelligentload__v job, identifies which FTP pattern is used to load the data into the ODS table.
  • columns - The following column options column may also be specified:
  • type - (required) - Redshift data type / length (e.g varchar(255), numeric(12,2), boolean, datetime, date)
  • notNull - (default: false) - indicates the column may not be empty, which is useful if the attribute is a Primary Key.
  • default - (default: empty) - field value is accepted if a value is not provided by the .csv (e.g. might use ‘Japan’ to label all new country fields with the Japan location.)

To complete the metadata package:

  1. Zip the tables sub-directory.
  2. Upload the zipped file to your connector.

Load Patterns

The load definitions set in the .yml files determine the method used to load the data to Redshift in each step when using the SFTP job load, jb_ftp_intelligentload__v, in the STG table, three different methods may be used as loadPatterns.

Load patterns for STG

Pattern Name

Description

Awareness of Slowly Changing Dimensions

pt_ftp_stg_rawload__v

Standard load of csv data into STG after validation in FTP table that csv matches FTP table definitions

Pattern cannot be used for slowly changing dimensions

pt_ftp_stg_rawformulaload__v

Load of csv data into STG after validation in FTP table that csv matches FTP table definitions. Allows for extra columns to be created using formulas. This pattern must be used if using vFormula in the STG yml.

Pattern cannot be used for slowly changing dimensions

pt_ftp_stg_batch_rawload__v

Concatenates all data files with the same constant value from the FTP upload folder into STG table before loading into ODS. Full list of constant values supported:

  • _dx_
  • _sx_
  • _px_
  • _rx_nmkt_
  • _rx_
  • _patient_activity_

Supported date patterns:

  • YYYYMM
  • YYYYMMDD

Pattern cannot be used for slowly changing dimensions.

pt_ftp_stg_batch_rawformulaload__v

Concatenates all data files with the same constant value from the FTP upload folder into STG table before loading into ODS. Allows for extra columns to be created using formulas. This pattern must be used if using vFormula in the STG yml. Full list of constant values supported:

  • _dx_
  • _sx_
  • _px_
  • _rx_nmkt_
  • _rx_
  • _patient_activity_

Supported date patterns:

  • YYYYMM
  • YYYYMMDD
Pattern cannot be used for slowly changing dimensions.

Load patterns for ODS

FTP ODS Load Patterns use No Schema Binding so that views are retained when an ODS table is dropped. Views are automatically recreated when new data is loaded using FTP.

Pattern Name

Description

Prerequisites

Awareness of Slowly Changing Dimensions (SCDs)

pt_ftp_ods_genload__v

Truncate and reload

n/a

Truncate and reload doesn’t keep history for SCDs

pt_ftp_ods_upsertload__v

Upsert load based upon the primaryKey of both the STG / ODS table. 

Must have primaryKey attribute for both STG and ODS table

Upsert load doesn’t keep history for SCDs

pt_ftp_ods_rollingload__v

Rolling load based on rollingDateKey on the ODS yml. The ODS data will be truncated and replaced for all records where the rollingDateKey is in the same range as the file being loaded.

- Must have primaryKey attribute for both STG and ODS Table. 

- Must have rollingDateKey attribute for STG and ODS table

Rolling load doesn’t keep history for SCDs

pt_ftp_ods_replace_ed_latest__v

Loads data in an effective dated manor with a start_date__v/end_date__v for each record. With “replace’, all the active records in the ODS are updated with records from STG. With “latest”, Nitro only keeps track of the latest historical version of a record. 

Must have primaryKey attribute for both STG and ODS tableSTG table must have last_modified_date__v field. ODS tables must include columns start_date__v, end_date__v, and last_modified_date__v

Full understands populating start_date__v, end_date__v, system_last_modified_date__v

pt_ftp_ods_replace_ed_history__v

Loads data in an effective dated manor with a start_date__v/end_date__v for each record. With “replace’, all the active records in the ODS are updated with records from STG. With “history”, Nitro keeps track of the full change history for each record.

Must have primaryKey attribute for both STG and ODS tableSTG table must have last_modified_date__v field. ODS tables must include columns start_date__v, end_date__v, and last_modified_date__v

Full understands populating start_date__v, end_date__v, system_last_modified_date__v

pt_ftp_ods_upsert_ed_latest__v

Loads data in an effective dated manor with a start_date__v/end_date__v for each record. With “upsert’, records are updated and never deleted, just end dated. The primaryKey uses the end_date__v, so upserts are only looking for records where end_date__v is Null. With “latest”, Nitro only keeps track of the latest historical version of a record. 

Must have primaryKey attribute for both STG and ODS tableSTG table must have last_modified_date__v field. ODS tables must include columns start_date__v, end_date__v, and last_modified_date__v

Full understands populating start_date__v, end_date__v, system_last_modified_date__v

pt_ftp_ods_upsert_ed_history_v

Loads data in an effective dated manor with a start_date__v/end_date__v for each record. With “upsert’, records are updated and never deleted, just end dated. The primaryKey uses the end_date__v, so upserts are only looking for records where end_date__v is Null. With “history”, Nitro keeps track of the full change history for each record.

Must have primaryKey attribute for both STG and ODS tableSTG table must have last_modified_date__v field. ODS tables must include columns start_date__v, end_date__v, and last_modified_date__v

Full understands populating start_date__v, end_date__v, system_last_modified_date__v

pt_ftp_ods_upsert_deletion__v

Loads Symphony APLD and PTD claims data. This allows users to load Incremental files including previously delivered records that need to be updated.

Each file is delivered for a specific year/month. All records in the ODS where deletion_flag__v is marked true will be deleted prior to upserting all records from STG table.

This step must be performed prior to uploading the rest of the file since the same record may be restated in the file.

Must have deleteKey table attribute populated on STG and ODS tables with the date field that identifies records to be deleted/upserted.

Nitro deletes the previous record in ODS.

pt_ftp_ods_insertload__v

Appends data from the Staging table directly to the ODS table without requiring a primary key.

No support of extension objects

No awareness

Rolling Loads

Data providers often deliver files with the most recent 2 year history of transactions, delivered on a weekly or monthly basis. As far as the data provider is concerned, the two year period is the most accurate, and should replace anything there before. However, any historical data that isn’t in the most current 2 year history, should remain. For these types of deliverables, the rolling load pattern should be used.

ODS Rolling Partition Load Pattern

The ODS Rolling Partition load pattern allows customers to use a rolling load pattern with a partition key to load data into a table. The Rolling Partition load pattern supports multiple markets for the partition. This also allows admins to upload a file containing the different market data. This pattern should be used when data is provided as a rolling load, but for different sets of data into a single table (for example, sales data provided by market).

To use this feature, use load pattern (pt_ftp_ods_rollingpartitionload__v) and two keys: rollingDateKey and ftppartitionKey.

For example, looking at sales data, if partitionKey is set for market_id__v and rollingDateKey is set for week_ending_date__v, when a new file is introduced, all records that fall between the earliest and latest date for week_end_date__v are replaced with the data from the new file where the market_id__v values match.

Configurable Composite Key

In order to process data effectively in Nitro, customers need to build in ETL processing to their custom connector to compute a primary key. A simple formula can be used to specify a primary key that is a pipe-delimited composite of one or more dimensions and attributes in the source data.

Additional configuration is needed to the STG and ODS yaml files that define your custom connector.

  1. Define a composite key column in the STG yaml file, and specify your composite key configuration via a new column-level attribute called formula. The formula attribute takes a value of a vFormula definition. This definition pulls fields from the source data and concatenates them using the delimiter defined. For example:

  2. - name: composite_id__c
  3. compression: ZSTD
  4. enabled: true
  5. notNull: false
  6. type: varchar(50)
  7. formula: vFormula.concat('|', field_1__c, field_2__c, field_3__c)
  8. Set the top-level loadPattern attribute in the STG yaml file to pt_ftp_stg_formulaload__v.

  9. Set the top-level primaryKey attribute in the STG yaml file to the name of your composite key field.

  10. Define a column in the ODS yaml file to hold the result of the formula calculation. For example
  • - name: composite_id__c
  • compression: ZSTD
  • enabled: true
  • notNull: false
  • type: varchar(50)
  1. Set the top-level loadPattern attribute in the ODS yaml file to pt_ftp_ods_upsertload__v, pt_ftp_ods_rollingload__v, or pt_ftp_ods_genload__v.
  2. Set the top-level primaryKey attribute in the ODS yaml file to the name of your composite key field.

Since the table yaml references the pattern pt_ftp_stg_formulaload__v when uploading the metatdata package, include the pt_ftp_stg_formulaload__v.yml as part of the package. The zipped metadata file should contain at least two directories, one of which is the patterns directory with the referenced pattern yaml.

If applying a new composite key or formula field to only an existing tables, Gen FTP does not shapeshift. This means that when trying to create a composite key on top of an existing table, use pt_ftp_ods_genload__v which drops and recreates the ODS table on the initial load. Ensure to back up the data to reload using Nitro’s SFTP export feature. For subsequent loads, switch back to the rolling date or upsert pattern. If applying a composite key to a brand new table, this step is not needed.

Supporting Redshift Formulas

Setting default values in data coming in from a custom connector traditionally required extra ETL processing. Nitro allows Redshift formulas to be specified directly within the custom connector configuration. These formulas are evaluated as data and inserted or updated in Redshift.

Additional configuration is needed to the STG and ODS yaml files that define your custom connector.

  1. Define a new columnin the STG yaml file to be populated using a Redshift formula.
  2. Specify a new column-level attribute called formula with the Redshift formula you want to use. For example:

    • - name: last_modified_date__c
    • compression: ZSTD
    • enabled: true
    • notNull: false
    • type: timestamp
    • formula: sysdate
  3. Define a column in the ODS yaml file to hold the result of the formula calculation. For example:

    • - name: last_modified_date__v
    • compression: ZSTD
    • enabled: true
    • notNull: false
    • type: timestamp

Supporting Wildcards in the Filename Attribute of SFTP Control Files

When loading data into Nitro via a custom connector, a control file is required to initiate processing of that data. The control file refers to its corresponding data file using an specific file name. In many cases, this file name includes a time stamp different each time data loads to the Nitro SFTP server. Customers can use wildcards in the filename attribute of the control file to avoid ongoing manipulation of the control file.

Use the wildcard symbol ( *) in the uploadFilename attribute in the control file. Nitro searches for a matching file based on the pattern specified. If multiple files are returned, Nitro orders the results in ascending alphabetical order and processes only the first one. For example:

Given the following files on Nitro SFTP server:

  • VERTEO_XFACTORY_20190703.gz
  • VERTEO_XFACTORY_20190704.gz
  • VERTEO_XFACTORY_20190705.gz

And given the following uploadFilename:

  • KARYO_XFACTORY_*.csv

Nitro processes:

    VERTEO_XFACTORY_20190703.gz

Supporting Effective Dating

Effective dating support is the ability to track full comprehensive history on data. The data administrator can leverage an out of the box mechanism to apply effective dating logic to data coming into Nitro through a custom connector.

Adding Source Filenames as Data into the FTP Data Pipeline

Admins can specify the formula attribute of a column in the STG YAML to leverage an ETL variable labeled CDW_ftpCsvFilename. When used, Nitro replaces that variable with the complete filename of the associated file being loaded. This reduces the friction of onboarding data from sources that put critical file attributes in the filename rather than directly in the data itself.

Loading the Data

Data for industry connectors leverages Nitro's integrated SFTP server. See Loading Data Using SFTP for instructions.