Design, implementation, and testing of a data ingestion (ETL) pipeline, to move relational data from
AWS S3 into AWS Redshift, then back to AWS S3 for datalakes. In a team of 2 developers and a
product owner, we built using Python, SQLAlchemy, and Boto3 on top of AWS Step Functions,
AWS Lambda, and AWS ECS, pipelines for data ingestion that ran every 10 minutes to ingest
~500MB of diverse relational data per run for 35 production factories around the globe.
Other parts of the solution was also running data transformation, and finally moving the data
back to S3 in parquet format to create a data lake available to end users through Athena and
Microstrategy.
For some maintenance we were using AWS EMR to process data, clean up parquet files, fixing
data partitions, and moving data to other regions or buckets.