
Lead Data Engineer at Luxoft USA Inc
Toronto, ON
About the Job
Project Description:
Large transformation of On-prem Hadoop to AWS performed for the Canadian insurance company. Refactoring of 10,000 ETL jobs to native AWS. Moving Informatica and Qlik reporting to the cloud.
Responsibilities:
Design and Develop ETL Pipeline to ingest data into Hadoop from different data sources (Files, Mainframe, Relational Sources, NoSQL Etc.) using Informatica BDM
Parse unstructured data, semi structured data such as JSON, XML etc. using Informatica Data Processor.
Analyze the Informatica Power Center Jobs and redesign and develop them in BDM.
Design and develop efficient Mapping and workflows to load data to Data Marts.
Perform the GAP analysis between various legacy applications to migrate them to newer platforms/data marts.
Write efficient queries in Hive or Impala and PostgreSQL to extract data on Adhoc basis to do the data analysis.
Identify the performance bottlenecks in ETL Jobs and tune their performance by enhancing or redesigning them.
Work with Hadoop administrators, PostgreS DBAs to partition the hive tables, refresh metadata and various other activities, to enhance the performance of data loading and extraction.
Performance tuning of ETL mappings and queries.
Write simple or medium complex shell scripts to preprocess the files, schedule ETL jobs etc.
Identify various manual processes, queries etc. in the Data and BI areas, design and develop ETL Jobs to automate them.
Participate in daily scrums; work with vendor partners, QA team and business users in various stages of development cycle.
Parse unstructured data, semi structured data such as JSON, XML etc. using Informatica Data Processor.
Analyze the Informatica Power Center Jobs and redesign and develop them in BDM.
Design and develop efficient Mapping and workflows to load data to Data Marts.
Perform the GAP analysis between various legacy applications to migrate them to newer platforms/data marts.
Write efficient queries in Hive or Impala and PostgreSQL to extract data on Adhoc basis to do the data analysis.
Identify the performance bottlenecks in ETL Jobs and tune their performance by enhancing or redesigning them.
Work with Hadoop administrators, PostgreS DBAs to partition the hive tables, refresh metadata and various other activities, to enhance the performance of data loading and extraction.
Performance tuning of ETL mappings and queries.
Write simple or medium complex shell scripts to preprocess the files, schedule ETL jobs etc.
Identify various manual processes, queries etc. in the Data and BI areas, design and develop ETL Jobs to automate them.
Participate in daily scrums; work with vendor partners, QA team and business users in various stages of development cycle.
Mandatory Skills Description:
7+ years of experience in designing and developing ETL Jobs (Informatica or any other ETL tool)
3+ years of experience working on Informatica BDM platform
Experience on various execution modes in BDM such Blaze, Spark, Hive, Native.
3+ years of experience working on Hadoop Platform, writing hive or impala queries.
5+ years of experience working on relational databases (Oracle, Teradata, PostgreSQL etc.) and writing SQL queries.
Should have deep knowledge on performance tuning of ETL Jobs, Hadoop Jobs, SQL's, Partitioning, Indexing and various other techniques.
Experience in writing Shell scripts.
1+ years of experience with working on AWS technologies for data pipelines, data warehouses
Minimum 5+ years of experience with building ETLs to load data warehouse, data marts
Awareness of Kimball and Inmon data warehouse methodologies
Experience working in Agile SCRUM methodology, should have used Jira, Bit bucket, GIT, Jenkins to deploy the codes from one environment to other.
Experience working in diverse multicultural environment with different vendors, onsite/offshore vendor teams etc.
Certifications in Informatica product suite as a developer
Nice to have 2+ years of experience with AWS data stack (IAM, 33, Kinesis Stream, Kinesis firehose, Lambda, Athena, Glue, RedShift and EMR
Exposure to other cloud platforms such as Azure and GCP are acceptable as well
3+ years of experience working on Informatica BDM platform
Experience on various execution modes in BDM such Blaze, Spark, Hive, Native.
3+ years of experience working on Hadoop Platform, writing hive or impala queries.
5+ years of experience working on relational databases (Oracle, Teradata, PostgreSQL etc.) and writing SQL queries.
Should have deep knowledge on performance tuning of ETL Jobs, Hadoop Jobs, SQL's, Partitioning, Indexing and various other techniques.
Experience in writing Shell scripts.
1+ years of experience with working on AWS technologies for data pipelines, data warehouses
Minimum 5+ years of experience with building ETLs to load data warehouse, data marts
Awareness of Kimball and Inmon data warehouse methodologies
Experience working in Agile SCRUM methodology, should have used Jira, Bit bucket, GIT, Jenkins to deploy the codes from one environment to other.
Experience working in diverse multicultural environment with different vendors, onsite/offshore vendor teams etc.
Certifications in Informatica product suite as a developer
Nice to have 2+ years of experience with AWS data stack (IAM, 33, Kinesis Stream, Kinesis firehose, Lambda, Athena, Glue, RedShift and EMR
Exposure to other cloud platforms such as Azure and GCP are acceptable as well
Nice-to-Have Skills:
Experience in Spark Jobs (Python or Scala) is an asset.
Nice to have knowledge on all the products of Informatica such as IDQ, MDM, IDD, BDM, Data Catalogue, Power Center etc.
Property & Casualty Insurance industry knowledge will be an added asset
Nice to have knowledge on all the products of Informatica such as IDQ, MDM, IDD, BDM, Data Catalogue, Power Center etc.
Property & Casualty Insurance industry knowledge will be an added asset