Latest Insights

Blog

photo

Data Rocket Implementation – What You Get in First 20 Hours

Technology Used: 

Data Rocket™ is an acceleration architecture for data estate modernization – but just how fast is Rocket fast? 

In the first three days of Data Rocket implementation, customers go from zero – no Talend or Snowflake implementation – to configuration in the Snowflake Data Cloud with millions of rows of data. For Passerelle data engineer David Adams, who has worked in data engineering for 35 years – the Data Rocket is nothing short of revolutionary. 

“To do this from scratch would have taken many experienced team members and a lot of time – we are talking many, many months,” David said.  

Set-Up and Configuration 

All in all, set-up and configuration of Data Rocket takes about six hours. Passerelle engineers create a Snowflake account and customize scripts for the first data source. Scripts include roles, databases, schemas, tables and all the metrics that will be used in Data Rocket dashboards. 

Once Snowflake setup is complete, a Talend Cloud account is created and configured – engineers create users and assign permissions, and create environments, workspaces, projects and remote engines. 

Data Staging  

With Snowflake and Talend credentials in hand, and accounts configured, data can start being pulled from a data source in the Snowflake Data Cloud. Engineers use Data Rocket’s Dynamic Data Ingestion Framework to test the first set of data. 

First, a Passerelle engineer configures Metadata from the data source. Data Rocket™ SQL scripts accelerate populations of Metadata information. A scheduled Talend Job refers to Metadata and extracts data from the associated source system, ingesting it into the transient staging layer of the Snowflake database. The Data Source could be a relational database, files or REST API. In a recent implementation, Passerelle engineers ingested millions of rows of data from one table. While it took two hours to query the data from the source database, Snowflake ingests the data in a matter of seconds.  

Data Quality and Data Masking  

With data in the Snowflake Data Cloud, Data Rocket automatically conducts data quality checks and data masking. After completing the data load into transient staging, the Talend job triggers the persistent staging loading process, wherein newly loaded data is compared with existing data loaded in the persistent layer and a data snapshot history is created.  

Based on the comparison the data is loaded one of three ways:  

  • If no match is found, a new row is added. 
  • If a match is found with new values, the previous version of the record is updated and a new version of the record is inserted.  
  • If a match is found with no differences, no changes are made. 

The Talend Job also compares new data with historical data (already ingested) of any changes. Snowflake dynamically updates, including new/deleted columns, table change, data type/structure change using Talend components orchestrated by the Dynamic Data Ingestion job.  

20 hours with Data Rocket 

After 20 hours with Data Rocket, a company with no previous data architecture can begin working with raw data from a source system, using a replicable framework that can be used again and again. Within the first month of Data Rocket adoption, additional modules are implemented, including custom data quality dashboards, and audit and control framework, and a data security accelerator.  

Return to Blog