Senior Data Engineer - (Hadoop - Scala, Spark, Python)

Brussels

‐ Onsite

This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Keywords

Databases Support Python Hadoop Scala Reporting Analysis agile NoSQL Design

Required skills:

Experience with analysis and creation of data pipelines, data architecture, ETL/ELT development and with processing structured and unstructured data
Proven experience with using data stored in RDBMSs and experience or good understanding of NoSQL databases
Ability to write performant Scala code and SQL statements
Ability to design with focus on solutions that are fit for purpose whilst keeping options open for future needs
Ability to analyze data, identify issues (eg gaps, inconsistencies) and troubleshoot these
Have a true agile mindset, capable and willing to take on tasks outside of her/his core competencies to help the team
Experience in working with customers to identify and clarify requirements
Strong verbal and written communication skills, good customer relationship skills
Strong interest in the financial industry and related data.

Will be considered as assets:

Knowledge of Python and Spark
Understanding of the Hadoop ecosystem including Hadoop file formats like Parquet and ORC
Experience with open source technologies used in Data Analytics like Spark, Pig, Hive, HBase, Kafka,
Ability to write MapReduce & Spark jobs
Knowledge of Cloudera
Knowledge of IBM Mainframe
Knowledge of AGILE development methods such as SCRUM is clearly an asset.

Job description:

Identify the most appropriate data sources to use for a given purpose and understand their structures and contents, in collaboration with subject matter experts.
Extract structured and unstructured data from the source systems (relational databases, data warehouses, document repositories, file systems, ), prepare such data (cleanse, re-structure, aggregate, ) and load them onto Hadoop.
Actively support the reporting teams in the data exploration and data preparation phases.
Implement data quality controls and where data quality issues are detected, liaise with the data supplier for joint root cause analysis
Be able to autonomously design data pipelines, develop them and prepare the launch activities
Properly document your code, share and transfer your knowledge with the rest of the team to ensure a smooth transition into maintenance and support of production applications
Liaise with IT infrastructure teams to address infrastructure issues and to ensure that the components and software used on the platform are all consistent

To apply to this project you must log in.