Description
We require a solid technical architect with knowledge of high volume data processing across RDBMS, big data and cloud technologies.
This includes Spark, Hadoop, SQL and if possible familiarity with big data tech on GCP (BQ, GCS etc). Any skills in building out Data Quality checks would be beneficial.
Some key tasks:
- Design for big data storage architecture including partitioning schemes, flattening and tagging
- Projection of data through a query engine such as Hive/Impala
- Design for handling small data including data corrections, data exceptions
Mandatory Skills:
- Java, Spark, Hadoop, Hive, Impala
- Big data serialization formats (Avro, Parquet etc)
- Ability to build prototypes and work with development teams through to realisation.
Desirable skills:
- RDBMS (Postgres and Oracle preferred)
- Datamodelling (ideally someone with a history in data warehousing)
- GCP data technologies (GCS, BQ, etc)
- Knowledge of tuning and scalability
- An understanding of concepts including time complexity and space complexity