Description
Beschreibung / Aufgaben:• Design, build, test and deploy cutting edge solutions at scale, impacting millions of customers worldwide drive value from data
• Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery and machine learning algorithms, re-designing infrastructure for greater scalability and speed.
• Identify right open source tools to deliver product features by performing research, POC/Pilot and interacting with various open source forums.
• Interact with engineering teams across geographies to leverage expertise and contribute to the tech community.
• Engage with Project Management and Business to drive the agenda, set your priorities and deliver solutions by leveraging the power of data.
• Assemble large and complex data sets that meet functional / non-functional business requirements
• Building and optimizing data pipelines, architectures and data sets.
• Building and optimizing Machine Learing workflows to deliver actionable insights to the business.
• Ability to work on multiple assignments and communication with stake holders with minimal supervision
• Flexibility to support project issues during weekend and non-working hours.
• Self-starter, lifelong learner and curiosity are also key qualities that help to define successful candidates.
Work Experience:
• 2 to 8 years of experience in software development/Data & Machine Learning Engineering and minimum 2+ years of experience in Big Data engineering.
• Experience working with Big data Ecosystem like (AWS cloud native tooling, HDFS, MapReduce, Hive, TEZ, Spark, Oozie, Sqoop, Kafka and Any NoSQL databases).
• Hands on experience in at-least one of the programming languages – Python (preferred), , PySpark, Java, Scala along with non-procedural language such as SQL.
• Experience working with LLAP and/or spark – LLAP will be a plus
• Experience in creating data and machine learning workflows to control Java, Hive, Spark, SSH and Shell actions.
• Experience in Linux/Unix shell scripting.
• Experience working with large data sets in distributed computing to perform Massive parallel processing which Contributes to build analytics pipeline.
• Experience in building python modules and packages to make reusable components
• Experience handling workflow failures for the running data & machine learning pipelines
• Troubleshooting application issues by analysing the logs.
• Experience in implementing performance optimization techniques in HQL, Spark applications, Python, Scala and Sqoop jobs as well as machine learning hyper-parameter tuning.
• Should be a thought leader and also a good team player.
• Knowledge in data privacy and security and being able to develop practices here together with the respective business functions such as IT Security and Data Privacy Office
Skills / Profil:
Sql
Python
Pyspark
Spark
Scoop
Oozie
Hadoop framework
hive Knowledge on AWS components is a benefit