Loading Data Using SFTP
Custom connectors and industry data connectors leverage Nitro’s integrated SFTP server to get data into Nitro’s database.
Files are removed from the FTP directory 180 days after file creation. Related jobs and IDs are retained, but the underlying file is removed.
Connecting to the SFTP Server
SFTP user credentials are issued when Nitro Support provisions the connector. These credentials are unique for each connector and contain the following components:
- User Name – for each specific connector
- API Key – 128-character reusable key placed in the .CTL control file to initiate Nitro ETL jobs
Hostname and Location
The hostname for the SFTP connection depends on the location of the Nitro cluster:
All locations require the FTP protocol 'SFTP - SSH File Transfer Protocol.'
Customers with firewall rules defining where data from their corporate network can be sent must contact Nitro Support for the IP address to whitelist.
After configuring the connection to the connector-specific SFTP location, load data files into the root directory of the connector SFTP site:
/upload/<Cluster Name>/<Instance Name>/<Connector Name>/
Data should be loaded via a .csv data file – preferably fully quoted, fully escaped, CSV RFC 4180 Standard Compliant, UTF8 file.
Nitro supports the following compression for uploading individual data files via SFTP:
A Nitro control file is a text file with a .ctl extension, and copied into the /ctlfiles directory in the SFTP folder. A control file is required to upload data to Nitro and is used to is used to trigger a Nitro ETL process, indicating the underlying file has completed it's FTP upload and is ready to be processed.
Fields in the.ctl file can include:
- uploadFilename - (required) – must refer to the name of the .csv text file being loaded
- userAPIKey - (required) - API key issued by Nitro support/operations for the user loading the data
- runTaskJob - (required) - identifies the name of the ETL job processed to load the file. For most load processes, Nitro recommends jb_ftp_intelligentload__v, as it detects the most appropriate load techniques for the data file being processed (as determined by the values set in the YML files).
- Loading Patterns are noted on the YML files
- targetTableName - (required) - name of the STG table definition used to identify the metadata being loaded. For example, sample_STG__c
- maxErrorCount (optional) - number of failed rows Nitro processes before aborting the entire process. Admins can update the max error count attribute (maxErrorCount) in an FTP YAML file for a data source. This allows admins to load partial files with more than 1,000 errors without the ctl file (control file). The default value is 1,000. If a max error count is not provided, the default value is used.
- userName - (optional) - username of the user loading the data
Example .ctl File
# credentials used to run ETL job. JCI provides the key.
# job / task sequence to run to load file
Changing file layouts for data being loaded into Nitro requires redshift tables to be adjusted to match the new file layout. Users can remove columns from a table and add columns to the end of existing redshift tables in FTP connectors without manually manipulating the redshift tables. The removed columns remain in the ODS, without new updates to these fields.
Data is processed by adding a control file to the ctl directory within the Connector root directory:
/upload/<Cluster Name>/<Instance Name>/<Connector Name>/ctlfile
Nitro performs preliminary validation on the data before loading it into the database. Validation includes:
- Ensuring the number of columns in the inbound raw data matches what is expected in the FTP yml definition
- Ensuring the data types as defined in the FTP yml definition are compatible with what is coming in from the raw data
- Ensuring required fields are present in the raw data
- Ensuring the field lengths in the raw data are supported by what is specified in the FTP yml definition
Nitro identifies errors and continues processing until the error threshold is reached, defined by the maxErrorCount control file attribute.
Nitro validates table YAMLs to detect invalid table, column name, and data type when the metadata is stored. If any of these are found during validation, an error occurs and the metadata is not stored. Since admins are notified in advance of loading table YAML files into Nitro if there is an issue with the file, they can correct any issues before the table is created.
The following validations apply to the table YAMLs.
<Table/ Column Name>
- Contain only ASCII letters, digits, underscore characters (_), or dollar signs ($)
- Begin with an alphabetic character or underscore character
- Be between 1 and 127 characters in length
- Contain no quotation marks and no spaces. Not be a reserved SQL keyword.
- Data type should be supported by RS