Data Architect - Remote - Cloud/PySpark/Java or Scala

GB  ‐ Onsite
This project has been archived and is not accepting more applications.
Browse open projects on our job board.

Description

Data Architect with cloud ideally GCP and PySpark experience is required for 6-month contract with a leading financial services organisation based in London. You will architect, design, estimate, developing and deploy cutting edge software products and services that leverage large scale data ingestion, processing, storage and querying, in-stream & batch analytics for Cloud and on-prem environments.

THIS ROLE IS FULLY REMOTE AND INSIDE IR35

Experience:

  • Extensive experience with Data related technologies, including knowledge of Big Data Architecture Patterns and Cloud services (AWS/Azure/GCP)
  • GCP experience is desirable (Big Query, Pub-Sub, Spanner)
  • Experience delivering end to end Big Data solutions on-premise and/or on Cloud
  • Knowledge of the pros and cons of various database technologies like Relational, NoSQL, MPP, Columnar databases
  • Expertise in the Hadoop eco-system with one or more distribution-like Cloudera and cloud-specific distributions
  • Proficiency in Java and Scala programming languages
  • Python experience
  • Expertise in one or more NoSQL database (Mongo DB, Cassandra, HBase, DynamoDB, Big Table etc.)
  • Experience of one or more big data ingestion tools (Sqoop, Flume, NiFI etc.), distributed messaging and ingestion frameworks (Kafka, Pulsar, Pub/Sub etc.)
  • Expertise with at least one distributed data processing framework eg Spark (Core, Streaming, SQL), Storm, Flink etc.
  • Knowledge of flexible, scalable data models addressing a wide variety of consumption patterns including random-access, sequential access including necessary optimisations like bucketing, aggregating, sharding
  • Knowledge of performance tuning, optimization and scaling solutions from a storage/processing standpoint
  • Experience building DevOps pipelines for data solutions, including automated testing

Desirable:

  • Knowledge of containerization, orchestration and Kubernetes engine
  • An understanding of how to setup Big data cluster security (Authorization/Authentication, Security for data at rest, data in transit)
  • A basic understanding of how to manage and setup Monitoring and alerting for Big data clusters
  • Experience of orchestration tools - Oozie, Airflow, Ctr-M or similar
  • Experience of MPP style query engines like Impala, Presto, Athena etc.
  • Knowledge of multi-dimensional modelling like start schema, snowflakes, normalized and de-normalized models
  • Exposure to data governance, catalog, lineage and associated tools would be an added advantage
  • A certification in one or more cloud platforms or big data technologies
  • Any active participation in the Data Engineering thought community (eg blogs, key note sessions, POV/POC, hackathon)
Start date
ASAP
Duration
6 months
From
Strike IT Services
Published at
11.04.2021
Project ID:
2087811
Contract type
Freelance
To apply to this project you must log in.
Register