Description
Our client requires a German speaking ETL developer for a 4 months rolling contract working fully remotely.• Proven experience in ETL development, especially with tools and technologies such as Apache Spark, Hive, and Hadoop.
• Very good knowledge of programming languages such as Python, Java or Scala.
• Good knowledge of REST API integration, including authentication methods
• Experience with data transformation and cleansing techniques.
• Familiarity with database and data warehousing concepts
• Knowledge of version control systems and code management best practices.
• Experience with big data technologies Frameworks:
• asyncIO
• requests
• pyspark• pandas
We are looking for an experienced ETL (Extract, Transform, Load) developer to join our team and take on the task of
implementing an ETL pipeline that extracts rollout/blocking status info from a REST API and loads it into our Hadoop cluster. The successful candidate will
play a pivotal role in ensuring an efficient and reliable flow of data for our analysis and reporting needs.
Tasks:1. Data extraction:
Implement the extraction process to collect rollout/blocking status data from a REST API. This includes
understanding the API schema as well as authentication
2. Data transformation: Perform data transformation, cleansing, and enrichment to prepare the data for storage in our
Hadoop cluster.
3. Load: Design and implement the ETL process to ensure that the data is stored
efficiently and accurately on the Hadoop cluster.
4. API Integration: Develop and maintain the connection to the REST API, including authentication token management,
change monitoring, and error handling.
5. Performance optimization: Continuously optimize the ETL pipeline for efficiency and scalability, taking into account the
increase in data volume and peak loads.
6. Test and Validate: Create and execute tests to ensure data accuracy and reliability, and validate
data integrity throughout the ETL process.
7. Documentation: Create detailed documentation for the ETL pipeline, including code comments, data dictionaries, and
user manuals.
• pydantic
Scala:
• sttp
• akka-http
• spark