Description
Role: Data Engineer
Location: Remote
IR35: Inside
Rate: £/day (umbrella)
Duration: 110 days with possible extension
CV Deadline: 7th May
Minimum Requirement:
- Experience in Databricks Delta lake on how to Ingest, transform, load on Delta tables in Bronze, Silver, and Gold zone
- Experience in Databricks ETL processing using PySpark
- Experience/knowledge in Python
- Experience in an NHS/healthcare data environment
The Role:
- Data Vault Proof of Concept - Solution, Platform and Security Architecture and Data flows.
- Build, test, and promote data ingestion pipelines using Databricks
- Build, test, and promote metadata-driven data pipelines using Databricks to load into Data Vault with the defined model
- Build, test, and promote metadata-driven data pipelines using Databricks to read from Data vault and load into Data Mart with defined data aggregations/enrichment/transformation/data quality rules/data lineage
- Orchestrate data pipelines using Airflow/AWS Lambda
- Document low-level designs as per the defined standards
The Data Vault Proof of Concept will be coordinated with NHS Digital's DigiTrials team. It will include Hospital Episode Statistics and Personal Demographic Services data sets initially with Medicines and Healthcare products Regulatory Agency and other data sets once we have the data flow defined.
- Experience in Databricks ETL processing using PySpark
- Experience in AWS S3 storage, Lambda, DynamoDB
- Experience in Databricks Delta lake on how to Ingest, transform, load on Delta tables in Bronze, Silver, and Gold zone
- Experience/knowledge in Airflow
- Experience/knowledge in Python
- Experience/knowledge in Metadata catalog - AWS Glue/Collibra will be preferred
To apply for this role please submit your latest CV or contact Aspect Resources.