Data Engineer (m/f/d)

Darmstadt

‐ Onsite

This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Keywords

Python Java SQL Hadoop aws Hive Agile Software Development Spark HQL Amazon AWS machine learning Oozie Data Engineering Big Data Engineering Scoop PySpark data engineer

Description

Beschreibung / Aufgaben:

• Design, build, test and deploy cutting edge solutions at scale, impacting millions of customers worldwide drive value from data

• Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery and machine learning algorithms, re-designing infrastructure for greater scalability and speed.

• Identify right open source tools to deliver product features by performing research, POC/Pilot and interacting with various open source forums.

• Interact with engineering teams across geographies to leverage expertise and contribute to the tech community.

• Engage with Project Management and Business to drive the agenda, set your priorities and deliver solutions by leveraging the power of data.

• Assemble large and complex data sets that meet functional / non-functional business requirements

• Building and optimizing data pipelines, architectures and data sets.

• Building and optimizing Machine Learing workflows to deliver actionable insights to the business.

• Ability to work on multiple assignments and communication with stake holders with minimal supervision

• Flexibility to support project issues during weekend and non-working hours.

• Self-starter, lifelong learner and curiosity are also key qualities that help to define successful candidates.

Work Experience:

• 2 to 8 years of experience in software development/Data & Machine Learning Engineering and minimum 2+ years of experience in Big Data engineering.

• Experience working with Big data Ecosystem like (AWS cloud native tooling, HDFS, MapReduce, Hive, TEZ, Spark, Oozie, Sqoop, Kafka and Any NoSQL databases).

• Hands on experience in at-least one of the programming languages – Python (preferred), , PySpark, Java, Scala along with non-procedural language such as SQL.

• Experience working with LLAP and/or spark – LLAP will be a plus

• Experience in creating data and machine learning workflows to control Java, Hive, Spark, SSH and Shell actions.

• Experience in Linux/Unix shell scripting.

• Experience working with large data sets in distributed computing to perform Massive parallel processing which Contributes to build analytics pipeline.

• Experience in building python modules and packages to make reusable components

• Experience handling workflow failures for the running data & machine learning pipelines

• Troubleshooting application issues by analysing the logs.

• Experience in implementing performance optimization techniques in HQL, Spark applications, Python, Scala and Sqoop jobs as well as machine learning hyper-parameter tuning.

• Should be a thought leader and also a good team player.

• Knowledge in data privacy and security and being able to develop practices here together with the respective business functions such as IT Security and Data Privacy Office

Skills / Profil:

Sql
Python
Pyspark
Spark
Scoop
Oozie
Hadoop framework
hive Knowledge on AWS components is a benefit

Start date: 11.2021
Duration: 8 months
(extension possible)
From: GULP Information Services GmbH
Published at: 15.10.2021
Contact person:: Alexandra Müller
Project ID:: 2229725
Contract type: Freelance

To apply to this project you must log in.

Data Engineer (m/f/d)

Keywords

Description

Report project

Recommend this project

Application limit reached

Welcome to freelancermap!