03/20/2026 updated


100 % available
Senior Data Engineer | Azure, Databricks, Python, PySpark, SQL | Data Architecture & Cost Estimation
Skopje, Macedonia
Only remote
MSc, Intelligent Information Systems (Applied NLP on Big Data)About me
Senior Data Engineer (5.5 YOE) specializing in Azure, Databricks, and PySpark. Built production lakehouse platforms processing 1-5 TB/day across 20+ factories. Expertise in cost-conscious architecture, metadata-driven frameworks, and schema drift detection. B2B via SEPA from North Macedonia.
Apache AirflowAmazon Web ServicesAmazon S3ArchitectureMicrosoft AzureTelecommunicationsBillingCloud ComputingCode GenerationCode ReviewData ArchitectureData DictionaryInformation EngineeringExtract Transform Load (ETL)Data Modeling
**Cloud Data Engineering (Azure & AWS)**
- Azure Databricks, Microsoft Fabric, Azure Data Factory, ADLS Gen2, Azure Data Explorer (ADX), Event Hub, Key Vault, Unity Catalog
- AWS S3, Glue, Glue Catalog, Iceberg, Athena, CloudFormation, EventBridge, CloudWatch
- Architected serverless lakehouse ingesting high-frequency time-series data (13-inverter commercial solar site)
- Designed near-real-time IoT architecture for 300+ sensor types producing second-level data
**Spark & Lakehouse Technologies**
- PySpark (5.5 years production experience), Spark SQL, Delta Lake, Autoloader
- Change Data Capture (CDC), Slowly Changing Dimensions (SCD Type 1/2)
- Schema enforcement, schema evolution, schema drift detection for heterogeneous file formats (CSV, Parquet, Avro, ORC)
- Built ingestion pipelines processing 1-5 TB/day of global factory telemetry across 20+ manufacturing sites
**Data Modeling & Architecture**
- Data Vault 2.0 (Hubs, Links, Satellites, Point-in-Time tables, Bridge tables)
- Dimensional modeling (facts, dimensions, data dictionaries, data contracts)
- Medallion Architecture (Bronze-Silver-Gold layers)
- Metadata-driven frameworks with dynamic SQL code generation
**Orchestration & ETL/ELT**
- Apache Airflow, Apache NiFi (batch and streaming ingestion with state management)
- Azure Data Factory, SSIS
- GitLab CI/CD, Azure DevOps pipelines
- Designed reusable PySpark components for CDC, SCD1/2, and schema enforcement adopted across 50+ internal projects
**Cost Optimization & Financial Planning**
- Cloud cost governance: identifying billing anomalies (idle OpenSearch clusters, unused resources)
- Negotiated AWS service credits to recover budget from early-stage misconfigurations
- Comprehensive MVP and roadmap cost estimates mapping architectural components to specific Azure/AWS SKUs
**Enterprise & Startup Experience**
- Extended shared data platform framework at Robert Bosch used by 50+ projects
- Implemented centralized Device Directory lakehouse for 20+ factories integrating ADF and Databricks with Azure Cognitive Search
- Stabilized inherited vendor lakehouse for national telecom (15-30 sources, ~1 TB/day)
- Built metadata-driven SCD framework supporting multi-source ingestion (MS SQL, Oracle, PostgreSQL)
**Technical Leadership**
- Led delivery with junior engineers, establishing code reviews and Python standards
- Translated ambiguous stakeholder requirements into concrete data contracts, signal dictionaries, and repeatable runbooks
- Production support for analytics, reporting, and ML use cases
Languages
EnglishNative speaker
Project history
Senior Data Engineer (Consultant), Avenga | Client: V-Sailing | Jan 2026 - Feb 2026
Designed a near-real-time Azure ingestion, analytics and product architecture (seconds to minutes latency, 300+ sensor types producing data every second) by selecting optimized Event Hub patterns and ADX modeling.
Produced comprehensive MVP and roadmap cost and effort estimates, providing stakeholder visibility by mapping architectural components to specific Azure SKUs and scaling assumptions.
Established early data contracts via a signal dictionary and question bank (units, calibration status, sampling rate, expected ranges), translating ambiguous telemetry needs into a repeatable runbook with documented prerequisites.
Led delivery with junior engineers, establishing code reviews, Python standards, and CloudFormation-based IaC templates for repeatable dev/prod environments
Senior Data Engineer, Avenga | Client: Makedonski Telekom | Oct 2025 - Dec 2025
Stabilized an inherited vendor lakehouse for a national telecom (15-30 sources, ~1 TB/day), establishing rerun-safe processing and predictable recovery protocols.
Implemented Data Vault 2.0 processing in Spark for consistent historization, applying hash keys, SCD2 logic, soft deletes, and Point-in-Time (PIT) patterns; hardened NiFi ingestion (batch and streaming) by implementing state management and schema evolution handling.