Technology Used:
In the first three days of Data Rocket implementation, customers go from zero – no Talend or Snowflake implementation – to configuration in the Snowflake Data Cloud with millions of rows of data. For Passerelle data engineer David Adams, who has worked in data engineering for 35 years – the Data Rocket is nothing short of revolutionary.
“To do this from scratch would have taken many experienced team members and a lot of time – we are talking many, many months,” David said.
All in all, set-up and configuration of Data Rocket takes about six hours. Passerelle engineers create a Snowflake account and customize scripts for the first data source. Scripts include roles, databases, schemas, tables and all the metrics that will be used in Data Rocket dashboards.
Once Snowflake setup is complete, a Talend Cloud account is created and configured – engineers create users and assign permissions, and create environments, workspaces, projects and remote engines.
With Snowflake and Talend credentials in hand, and accounts configured, data can start being pulled from a data source in the Snowflake Data Cloud. Engineers use Data Rocket’s Dynamic Data Ingestion Framework to test the first set of data.
First, a Passerelle engineer configures Metadata from the data source. Data Rocket SQL scripts accelerate populations of Metadata information. A scheduled Talend Job refers to Metadata and extracts data from the associated source system, ingesting it into the transient staging layer of the Snowflake database. The Data Source could be a relational database, files or REST API. In a recent implementation, Passerelle engineers ingested millions of rows of data from one table. While it took two hours to query the data from the source database, Snowflake ingests the data in a matter of seconds.
With data in the Snowflake Data Cloud, Data Rocket automatically conducts data quality checks and data masking. After completing the data load into transient staging, the Talend job triggers the persistent staging loading process, wherein newly loaded data is compared with existing data loaded in the persistent layer and a data snapshot history is created.
Based on the comparison the data is loaded one of three ways:
The Talend Job also compares new data with historical data (already ingested) of any changes. Snowflake dynamically updates, including new/deleted columns, table change, data type/structure change using Talend components orchestrated by the Dynamic Data Ingestion job.
20 hours with Data Rocket
After 20 hours with Data Rocket, a company with no previous data architecture can begin working with raw data from a source system, using a replicable framework that can be used again and again. Within the first month of Data Rocket adoption, additional modules are implemented, including custom data quality dashboards, and audit and control framework, and a data security accelerator.