Data Engineer
Keywords
Information Engineering
Amazon S3
Big Data
Cloud Computing
MySQL
Apache Spark
ASP.NET
Advanced Business Application Programming (ABAP)
ADO.NET
Application Programming Interfaces (APIs)
Skills
Ado.net, Asp.net, AWS Glue, ABAP, DynamoDB, EC2, AWS EMR, Redshift, AWS S3, S3, AWS, Spark, Apache Spark, API, artificial intelligence, backend, data back, big data, big data Technologies, Autosys, Cloud, Cloud Services, Cloud Formation, analytics, Data Lake, logging, DML, data pipelines, data streaming, data transfer, Data warehouses, database designing, database testing, databases, Databricks, xml, ETL, Google Cloud, Data Engineer, Computer Science, Informatics, Integration testing, excel, Sql server, Visual Studio Code, mysql, MySql Server, OOPS, oracle, oracle 11g, Pandas, performance tuning, Programming Language, Pytest, Python, REST web services, HANA, SAP Technical, Sales Distribution, SQL, stored procedures, Shell Scripting, tableau, Unit testing
Project history
03/2023
-
03/2023
Amit Paul
Data Engineer
Computing Framework : Spark API- RDD & SparkSql
AWS Cloud Services : S3, Glue, EMR, Athéna, Lambda
Tools/Ide's : Pycharm, Visual Studio Code, Microsoft Teams
Data Science : Python Libraries-Pandas
Data Engineer
Computing Framework : Spark API- RDD & SparkSql
AWS Cloud Services : S3, Glue, EMR, Athéna, Lambda
Tools/Ide's : Pycharm, Visual Studio Code, Microsoft Teams
Data Science : Python Libraries-Pandas
10/2021
-
06/2022
Data Engineer
Brillio Technologies
Project: ETL for Pharma Majorclient
In this Big Data project we built a AWS Glue transformation for Redshift Spectrum to move the data
from AWS S3 and other external destination sources ; enabling data scientists to leverage GCP
(Google Cloud Capabilities) environments for performing research , analysis and experimentation.
Built automated Glue template / lambda script in EC2 instance to provision a platform based on
batch data streaming for specific global partners and stakeholders. From here Data Scientists
requests for relevant data post GCP creation in S3 data lake. Built a datapipeline and separated
the data transfer files from AWS S3 and enabled all components of ML and BI for research and
analysis.
Activities / responsibilities:
* Analyzing the different source systems and knowing how to extract data from each source system.
* Transforming and Loading the data to S3 by the help Apache Spark on Databricks.
* Coordinated with BI team to provide them with necessary data required for reporting.
* Designed and developed complex data pipelines for customer. * Wrote production level code for
logging, querying shipping sales.
* Construct ETL to capture productivity and data quality checks Environment: Python, Spark, Glue,
S3, Lambda, Cloud Formation, DynamoDB, Code pipeline, Code Build, Pytest, Step Functions, Athena
, Snowflake , Autosys , Shell Scripting Technologies: Python, Spark, Aws Glue, Lambda, S3, Athena ,
Sagemaker , Redshift
In this Big Data project we built a AWS Glue transformation for Redshift Spectrum to move the data
from AWS S3 and other external destination sources ; enabling data scientists to leverage GCP
(Google Cloud Capabilities) environments for performing research , analysis and experimentation.
Built automated Glue template / lambda script in EC2 instance to provision a platform based on
batch data streaming for specific global partners and stakeholders. From here Data Scientists
requests for relevant data post GCP creation in S3 data lake. Built a datapipeline and separated
the data transfer files from AWS S3 and enabled all components of ML and BI for research and
analysis.
Activities / responsibilities:
* Analyzing the different source systems and knowing how to extract data from each source system.
* Transforming and Loading the data to S3 by the help Apache Spark on Databricks.
* Coordinated with BI team to provide them with necessary data required for reporting.
* Designed and developed complex data pipelines for customer. * Wrote production level code for
logging, querying shipping sales.
* Construct ETL to capture productivity and data quality checks Environment: Python, Spark, Glue,
S3, Lambda, Cloud Formation, DynamoDB, Code pipeline, Code Build, Pytest, Step Functions, Athena
, Snowflake , Autosys , Shell Scripting Technologies: Python, Spark, Aws Glue, Lambda, S3, Athena ,
Sagemaker , Redshift
07/2018
-
09/2019
Data Engineer
Enum Informatics Private Limited
Project: ETL for Retail domain client
In this Big Data project we extract data from HANA DB using Informatica
and put them into Aws S3. From here we do the transformation using AWS
Glue and write the data back to s3. The data set residing on the s3 are
being populated to the end user by using tableau.
Activities / responsibilities:
* Written spark code to read the data, transform the data and write the data in s3.
* Coordinated with BI team to provide them with necessary data required for reporting.
* Validated the data set between the source and target systems.
* Given support to the exising pipelines in the production
Technologies: Python, Spark, Aws Glue, S3, Athena
In this Big Data project we extract data from HANA DB using Informatica
and put them into Aws S3. From here we do the transformation using AWS
Glue and write the data back to s3. The data set residing on the s3 are
being populated to the end user by using tableau.
Activities / responsibilities:
* Written spark code to read the data, transform the data and write the data in s3.
* Coordinated with BI team to provide them with necessary data required for reporting.
* Validated the data set between the source and target systems.
* Given support to the exising pipelines in the production
Technologies: Python, Spark, Aws Glue, S3, Athena
02/2018
-
06/2018
SAP Technical Consultant
Isquare Soft Technologies
Amit Paul
Data Engineer
HPE's new high-end storage platform to driving the next wave of the Intelligent
Edge and cloud choices. Customer validation , new creation of batch entity and
batch number of customer through HPI and HPE tenant.
Created implementation report for Customer inquiry and validation, used
interface implementation and standard method PRESET_SALES_AREA for valid
additional data under sales header area. Documented and specified it for
Implicit Implementation in BADI for Shipping Sales Requisition.
Roles & Responsibilities Undertaken
* Created the ODATA and REST web services report program to extract data in xml format using OOPS
and RFC methodologies
* Created class builder report using classic BADI and implemented it in XD01/XD02 to create and
fetch customer details based on shipping additional data screen fields and updated KNVV through
standard function methods PRESET_SALES_AREA using interface from implementation.
* Performed Unit testing and Integration testing for ALE Idocs which is generated and executed
successfully Worked on Conversion Routine to add field constraints in segment for ALE Idocs through
mandatory field LZONE with additional data and classification linking.
* Checked and verified partner details in presentation server file
* Cross verified with client to check extracted data migrated to standard field using BDC report and
merged it in custom table through field pointers concept.
Data Engineer
HPE's new high-end storage platform to driving the next wave of the Intelligent
Edge and cloud choices. Customer validation , new creation of batch entity and
batch number of customer through HPI and HPE tenant.
Created implementation report for Customer inquiry and validation, used
interface implementation and standard method PRESET_SALES_AREA for valid
additional data under sales header area. Documented and specified it for
Implicit Implementation in BADI for Shipping Sales Requisition.
Roles & Responsibilities Undertaken
* Created the ODATA and REST web services report program to extract data in xml format using OOPS
and RFC methodologies
* Created class builder report using classic BADI and implemented it in XD01/XD02 to create and
fetch customer details based on shipping additional data screen fields and updated KNVV through
standard function methods PRESET_SALES_AREA using interface from implementation.
* Performed Unit testing and Integration testing for ALE Idocs which is generated and executed
successfully Worked on Conversion Routine to add field constraints in segment for ALE Idocs through
mandatory field LZONE with additional data and classification linking.
* Checked and verified partner details in presentation server file
* Cross verified with client to check extracted data migrated to standard field using BDC report and
merged it in custom table through field pointers concept.
01/2009
-
12/2012
Data Engineer
Enum Informatics Private Limited
Project: ETL for Retail domain client
In this Big Data project we extract data from different sources like
mysql, oracle and put them into Aws S3. From here we do the
transformation using AWS Glue and drop the data into processed layer.
The processed layer data residing on the data lake is being published to
the end user by using Athena Views.
Activities / responsibilities:
* Analyzing the different source systems and knowing how to extract data from each source system.
* Transforming and Loading the data to S3 by the help Apache Spark.
* Coordinated with BI team to provide them with necessary data required for reporting.
* Designed and developed complex data pipelines for customer
* Wrote production level code for logging, querying shipping sales
* Construct ETL to capture productivity and data quality checks
Technologies: Python, Spark, Aws Glue, S3, Athena
In this Big Data project we extract data from different sources like
mysql, oracle and put them into Aws S3. From here we do the
transformation using AWS Glue and drop the data into processed layer.
The processed layer data residing on the data lake is being published to
the end user by using Athena Views.
Activities / responsibilities:
* Analyzing the different source systems and knowing how to extract data from each source system.
* Transforming and Loading the data to S3 by the help Apache Spark.
* Coordinated with BI team to provide them with necessary data required for reporting.
* Designed and developed complex data pipelines for customer
* Wrote production level code for logging, querying shipping sales
* Construct ETL to capture productivity and data quality checks
Technologies: Python, Spark, Aws Glue, S3, Athena
08/2012
-
11/2012
Testing Engineer
Infosys Pvt. Ltd.
for banking domain
Infosys Finacle solutions address the core banking liquidity management, wealth
management, analytics, artificial intelligence, and financial institutions to
drive business excellence.
Infosys Finacle universal banking solution builds on the success of Finacle to
deliver powerful benefits to global banks. The solution makes possible faster
launch of new products and services , helping banks realize a 55 percent return
on core banking transformation investments and an average improvement of 33
percent in their time to market as banks aim to reinvent their business and
navigate the current challenges in the macroeconomic environment, Finacle 11E
promises a simplified approach to banking transformation.
Roles & Responsibilities Undertaken
* Cross verified with client to check extracted data migrated to standard field using BDC report and
merged it in custom table through field pointers concept.
* Created test cases in excel sheet and written test scenarios for database testing using query on
the data on MySql Server.
* Performed DDL/DML operations on backend databases, triggered stored procedures while database
designing.
Infosys Finacle solutions address the core banking liquidity management, wealth
management, analytics, artificial intelligence, and financial institutions to
drive business excellence.
Infosys Finacle universal banking solution builds on the success of Finacle to
deliver powerful benefits to global banks. The solution makes possible faster
launch of new products and services , helping banks realize a 55 percent return
on core banking transformation investments and an average improvement of 33
percent in their time to market as banks aim to reinvent their business and
navigate the current challenges in the macroeconomic environment, Finacle 11E
promises a simplified approach to banking transformation.
Roles & Responsibilities Undertaken
* Cross verified with client to check extracted data migrated to standard field using BDC report and
merged it in custom table through field pointers concept.
* Created test cases in excel sheet and written test scenarios for database testing using query on
the data on MySql Server.
* Performed DDL/DML operations on backend databases, triggered stored procedures while database
designing.
Local Availability
Only available for remote work