Data Engineer (m/f/d)

Darmstadt  ‐ Onsite
This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Description

Beschreibung / Aufgaben:

• Design, build, test and deploy cutting edge solutions at scale, impacting millions of customers worldwide drive value from data

• Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery and machine learning algorithms, re-designing infrastructure for greater scalability and speed.

• Identify right open source tools to deliver product features by performing research, POC/Pilot and interacting with various open source forums.

• Interact with engineering teams across geographies to leverage expertise and contribute to the tech community.

• Engage with Project Management and Business to drive the agenda, set your priorities and deliver solutions by leveraging the power of data.

• Assemble large and complex data sets that meet functional / non-functional business requirements

• Building and optimizing data pipelines, architectures and data sets.

• Building and optimizing Machine Learing workflows to deliver actionable insights to the business.

• Ability to work on multiple assignments and communication with stake holders with minimal supervision

• Flexibility to support project issues during weekend and non-working hours.

• Self-starter, lifelong learner and curiosity are also key qualities that help to define successful candidates.



Work Experience:

• 2 to 8 years of experience in software development/Data & Machine Learning Engineering and minimum 2+ years of experience in Big Data engineering.

• Experience working with Big data Ecosystem like (AWS cloud native tooling, HDFS, MapReduce, Hive, TEZ, Spark, Oozie, Sqoop, Kafka and Any NoSQL databases).

• Hands on experience in at-least one of the programming languages – Python (preferred), , PySpark, Java, Scala along with non-procedural language such as SQL.

• Experience working with LLAP and/or spark – LLAP will be a plus

• Experience in creating data and machine learning workflows to control Java, Hive, Spark, SSH and Shell actions.

• Experience in Linux/Unix shell scripting.

• Experience working with large data sets in distributed computing to perform Massive parallel processing which Contributes to build analytics pipeline.

• Experience in building python modules and packages to make reusable components

• Experience handling workflow failures for the running data & machine learning pipelines

• Troubleshooting application issues by analysing the logs.

• Experience in implementing performance optimization techniques in HQL, Spark applications, Python, Scala and Sqoop jobs as well as machine learning hyper-parameter tuning.

• Should be a thought leader and also a good team player.

• Knowledge in data privacy and security and being able to develop practices here together with the respective business functions such as IT Security and Data Privacy Office


Skills / Profil:

Sql
Python
Pyspark
Spark
Scoop
Oozie
Hadoop framework
hive Knowledge on AWS components is a benefit
Start date
11.2021
Duration
8 months
(extension possible)
From
GULP Information Services GmbH
Published at
15.10.2021
Contact person:
Alexandra Müller
Project ID:
2229725
Contract type
Freelance
To apply to this project you must log in.
Register