Senior Data Engineer - (Hadoop - Scala, Spark, Python)

Brussels  ‐ Onsite
This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Description

Required skills:

  • Experience with analysis and creation of data pipelines, data architecture, ETL/ELT development and with processing structured and unstructured data
  • Proven experience with using data stored in RDBMSs and experience or good understanding of NoSQL databases
  • Ability to write performant Scala code and SQL statements
  • Ability to design with focus on solutions that are fit for purpose whilst keeping options open for future needs
  • Ability to analyze data, identify issues (eg gaps, inconsistencies) and troubleshoot these
  • Have a true agile mindset, capable and willing to take on tasks outside of her/his core competencies to help the team
  • Experience in working with customers to identify and clarify requirements
  • Strong verbal and written communication skills, good customer relationship skills
  • Strong interest in the financial industry and related data.

Will be considered as assets:

  • Knowledge of Python and Spark
  • Understanding of the Hadoop ecosystem including Hadoop file formats like Parquet and ORC
  • Experience with open source technologies used in Data Analytics like Spark, Pig, Hive, HBase, Kafka,
  • Ability to write MapReduce & Spark jobs
  • Knowledge of Cloudera
  • Knowledge of IBM Mainframe
  • Knowledge of AGILE development methods such as SCRUM is clearly an asset.

Job description:

  • Identify the most appropriate data sources to use for a given purpose and understand their structures and contents, in collaboration with subject matter experts.
  • Extract structured and unstructured data from the source systems (relational databases, data warehouses, document repositories, file systems, ), prepare such data (cleanse, re-structure, aggregate, ) and load them onto Hadoop.
  • Actively support the reporting teams in the data exploration and data preparation phases.
  • Implement data quality controls and where data quality issues are detected, liaise with the data supplier for joint root cause analysis
  • Be able to autonomously design data pipelines, develop them and prepare the launch activities
  • Properly document your code, share and transfer your knowledge with the rest of the team to ensure a smooth transition into maintenance and support of production applications
  • Liaise with IT infrastructure teams to address infrastructure issues and to ensure that the components and software used on the platform are all consistent
Start date
01/10/2020
Duration
12 months
From
Base 3
Published at
08.08.2020
Project ID:
1954878
Contract type
Freelance
To apply to this project you must log in.
Register