Loading Data Using SFTP

Custom connectors and industry data connectors leverage Nitro’s integrated SFTP server to get data into Nitro’s database.

Files are removed from the FTP directory 180 days after file creation. Related jobs and IDs are retained, but the underlying file is removed.

Connecting to the SFTP Server

Credentials

SFTP user credentials are issued when Nitro Support provisions the connector. These credentials are unique for each connector and contain the following components:

  • User Name – for each specific connector
  • Password
  • API Key – 128-character reusable key placed in the .CTL control file to initiate Nitro ETL jobs

Hostname and Location

The hostname for the SFTP connection depends on the location of the Nitro cluster:

Region Endpoint
US cdw-02-sftp-us.veevanitro.com
EU cdw-sftp-eu.veevanitro.com
AP cdw-sftp-ap.veevanitro.com

All locations require the FTP protocol 'SFTP - SSH File Transfer Protocol.'

Customers with firewall rules defining where data from their corporate network can be sent must contact Nitro Support for the IP address to whitelist.

Uploading Data

After configuring the connection to the connector-specific SFTP location, load data files into the root directory of the connector SFTP site:

Copy
/upload/<Cluster Name>/<Instance Name>/<Connector Name>/

Data should be loaded via a .csv data file – preferably fully quoted, fully escaped, CSV RFC 4180 Standard Compliant, UTF8 file.

Nitro supports the following compression for uploading individual data files via SFTP:

  • gz
  • zip

Control Files

A Nitro control file is a text file with a .ctl extension, and copied into the /ctlfiles directory in the SFTP folder. A control file is required to upload data to Nitro and is used to is used to trigger a Nitro ETL process, indicating the underlying file has completed it's FTP upload and is ready to be processed.

Fields in the.ctl file can include:

  • uploadFilename - (required) – must refer to the name of the .csv text file being loaded
  • userAPIKey - (required) - API key issued by Nitro support/operations for the user loading the data
  • runTaskJob - (required) - identifies the name of the ETL job processed to load the file. For most load processes, Nitro recommends jb_ftp_intelligentload__v, as it detects the most appropriate load techniques for the data file being processed (as determined by the values set in the YML files).
  • Loading Patterns are noted on the YML files
  • targetTableName - (required) - name of the STG table definition used to identify the metadata being loaded. For example, sample_STG__c
  • maxErrorCount (optional) - number of failed rows Nitro processes before aborting the entire process. Admins can update the max error count attribute (maxErrorCount) in an FTP YAML file for a data source. This allows admins to load partial files with more than 1,000 errors without the ctl file (control file). The default value is 1,000. If a max error count is not provided, the default value is used.
  • userName - (optional) - username of the user loading the data

Example .ctl File

Copy
uploadFilename=sample_*.csv
# credentials used to run ETL job. JCI provides the key.
userAPIKey=abcd1234abcd
userName=sftp_username__v
# job / task sequence to run to load file
runTaskJob=jb_ftp_intelligentload__v
targetTableName=sample_stg__v
maxErrorCount=1000

Shapeshifting for FTP Intelligent Load

Changing file layouts for data being loaded into Nitro requires redshift tables to be adjusted to match the new file layout. Users can remove columns from a table and add columns to the end of existing redshift tables in FTP connectors without manually manipulating the redshift tables. The removed columns remain in the ODS, without new updates to these fields.

Processing Data

Data is processed by adding a control file to the ctl directory within the Connector root directory:

Copy
/upload/<Cluster Name>/<Instance Name>/<Connector Name>/ctlfile

Data Validation

Nitro performs preliminary validation on the data before loading it into the database. Validation includes:

  • Ensuring the number of columns in the inbound raw data matches what is expected in the FTP yml definition
  • Ensuring the data types as defined in the FTP yml definition are compatible with what is coming in from the raw data
  • Ensuring required fields are present in the raw data
  • Ensuring the field lengths in the raw data are supported by what is specified in the FTP yml definition

Nitro identifies errors and continues processing until the error threshold is reached, defined by the maxErrorCount control file attribute.

YAML File Validation

Nitro validates table YAMLs to detect invalid table, column name, and data type when the metadata is stored. If any of these are found during validation, an error occurs and the metadata is not stored. Since admins are notified in advance of loading table YAML files into Nitro if there is an issue with the file, they can correct any issues before the table is created.

The following validations apply to the table YAMLs.

<Table/ Column Name>

  • Contain only ASCII letters, digits, underscore characters (_), or dollar signs ($)
  • Begin with an alphabetic character or underscore character
  • Be between 1 and 127 characters in length
  • Contain no quotation marks and no spaces. Not be a reserved SQL keyword.

<Data Type>

  • Data type should be supported by RS