ETL developer

remote  ‐ Remote
This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Keywords

Extract Transform Load (ETL) Apache Hadoop Restful Api Data Transformation Application Programming Interfaces (APIs) Apache Spark Java (Programming Language) Big Data Software Documentation Databases Data Dictionary Data Mining Data Warehousing Apache Hive Python (Programming Language) Scala (Programming Language) Pyspark Software Version Control Programming Languages

Description

Our client requires a German speaking ETL developer for a 4 months rolling contract working fully remotely.

• Proven experience in ETL development, especially with tools and technologies such as Apache Spark, Hive, and Hadoop.
• Very good knowledge of programming languages such as Python, Java or Scala.
• Good knowledge of REST API integration, including authentication methods
• Experience with data transformation and cleansing techniques.
• Familiarity with database and data warehousing concepts
• Knowledge of version control systems and code management best practices.
• Experience with big data technologies Frameworks:
• asyncIO
• requests
• pyspark• pandas

We are looking for an experienced ETL (Extract, Transform, Load) developer to join our team and take on the task of
implementing an ETL pipeline that extracts rollout/blocking status info from a REST API and loads it into our Hadoop cluster. The successful candidate will
play a pivotal role in ensuring an efficient and reliable flow of data for our analysis and reporting needs.
Tasks:1. Data extraction:
Implement the extraction process to collect rollout/blocking status data from a REST API. This includes
understanding the API schema as well as authentication
2. Data transformation: Perform data transformation, cleansing, and enrichment to prepare the data for storage in our
Hadoop cluster.
3. Load: Design and implement the ETL process to ensure that the data is stored
efficiently and accurately on the Hadoop cluster.
4. API Integration: Develop and maintain the connection to the REST API, including authentication token management,
change monitoring, and error handling.
5. Performance optimization: Continuously optimize the ETL pipeline for efficiency and scalability, taking into account the
increase in data volume and peak loads.
6. Test and Validate: Create and execute tests to ensure data accuracy and reliability, and validate
data integrity throughout the ETL process.
7. Documentation: Create detailed documentation for the ETL pipeline, including code comments, data dictionaries, and
user manuals.

• pydantic
Scala:
• sttp
• akka-http
• spark



Start date
11.2023
Workload
100% (5 days per week)
Duration
4 months
From
iBSC ltd
Published at
30.10.2023
Contact person:
Lahcene Yacef
Project ID:
2674433
Industry
IT
Contract type
Freelance
Workplace
100 % remote
To apply to this project you must log in.
Register