03/20/2026 updated

**** ******** ****
100 % available

Senior Data Engineer | Azure, Databricks, Python, PySpark, SQL | Data Architecture & Cost Estimation

Skopje, Macedonia
Only remote
MSc, Intelligent Information Systems (Applied NLP on Big Data)
Skopje, Macedonia
Only remote
MSc, Intelligent Information Systems (Applied NLP on Big Data)

Profile attachments

20260308-Filip_Markoski_CV_UK_Final.PDF

About me

Senior Data Engineer (5.5 YOE) specializing in Azure, Databricks, and PySpark. Built production lakehouse platforms processing 1-5 TB/day across 20+ factories. Expertise in cost-conscious architecture, metadata-driven frameworks, and schema drift detection. B2B via SEPA from North Macedonia.

Apache AirflowAmazon Web ServicesAmazon S3ArchitectureMicrosoft AzureTelecommunicationsBillingCloud ComputingCode GenerationCode ReviewData ArchitectureData DictionaryInformation EngineeringExtract Transform Load (ETL)Data ModelingData SharingData Vault ModelingDimensional ModelingFinancial PlanningGovernanceApache HivePython (Programming Language)Language TranslationPostgreSQLMetadataMicrosoft SQL ServersOracle ApplicationsProduction SupportTelemetryCost ModellingAzure Data LakeRequirements AnalysisSatellitesStock Keeping UnitSQL DatabasesSQL Server Integration ServicesData StreamingTechnical ManagementParquetAzure Data FactoryApache SparkCost OptimisationInternet of Things (IoT)Change Data CaptureCloudformationServicebusMicrosoft FabricData LakePySparkGitlab-ciAvroApache NifiRegion ManagementCloudwatchServerless ComputingDatabricks
**Cloud Data Engineering (Azure & AWS)**
- Azure Databricks, Microsoft Fabric, Azure Data Factory, ADLS Gen2, Azure Data Explorer (ADX), Event Hub, Key Vault, Unity Catalog
- AWS S3, Glue, Glue Catalog, Iceberg, Athena, CloudFormation, EventBridge, CloudWatch
- Architected serverless lakehouse ingesting high-frequency time-series data (13-inverter commercial solar site)
- Designed near-real-time IoT architecture for 300+ sensor types producing second-level data

**Spark & Lakehouse Technologies**
- PySpark (5.5 years production experience), Spark SQL, Delta Lake, Autoloader
- Change Data Capture (CDC), Slowly Changing Dimensions (SCD Type 1/2)
- Schema enforcement, schema evolution, schema drift detection for heterogeneous file formats (CSV, Parquet, Avro, ORC)
- Built ingestion pipelines processing 1-5 TB/day of global factory telemetry across 20+ manufacturing sites

**Data Modeling & Architecture**
- Data Vault 2.0 (Hubs, Links, Satellites, Point-in-Time tables, Bridge tables)
- Dimensional modeling (facts, dimensions, data dictionaries, data contracts)
- Medallion Architecture (Bronze-Silver-Gold layers)
- Metadata-driven frameworks with dynamic SQL code generation

**Orchestration & ETL/ELT**
- Apache Airflow, Apache NiFi (batch and streaming ingestion with state management)
- Azure Data Factory, SSIS
- GitLab CI/CD, Azure DevOps pipelines
- Designed reusable PySpark components for CDC, SCD1/2, and schema enforcement adopted across 50+ internal projects

**Cost Optimization & Financial Planning**
- Cloud cost governance: identifying billing anomalies (idle OpenSearch clusters, unused resources)
- Negotiated AWS service credits to recover budget from early-stage misconfigurations
- Comprehensive MVP and roadmap cost estimates mapping architectural components to specific Azure/AWS SKUs

**Enterprise & Startup Experience**
- Extended shared data platform framework at Robert Bosch used by 50+ projects
- Implemented centralized Device Directory lakehouse for 20+ factories integrating ADF and Databricks with Azure Cognitive Search
- Stabilized inherited vendor lakehouse for national telecom (15-30 sources, ~1 TB/day)
- Built metadata-driven SCD framework supporting multi-source ingestion (MS SQL, Oracle, PostgreSQL)

**Technical Leadership**
- Led delivery with junior engineers, establishing code reviews and Python standards
- Translated ambiguous stakeholder requirements into concrete data contracts, signal dictionaries, and repeatable runbooks
- Production support for analytics, reporting, and ML use cases


Languages

EnglishNative speaker

Project history

Senior Data Engineer (Consultant)

V-Sailing

Other

250-500 team member

Senior Data Engineer (Consultant), Avenga | Client: V-Sailing | Jan 2026 - Feb 2026
Designed a near-real-time Azure ingestion, analytics and product architecture (seconds to minutes latency, 300+ sensor types producing data every second) by selecting optimized Event Hub patterns and ADX modeling.
Produced comprehensive MVP and roadmap cost and effort estimates, providing stakeholder visibility by mapping architectural components to specific Azure SKUs and scaling assumptions.
Established early data contracts via a signal dictionary and question bank (units, calibration status, sampling rate, expected ranges), translating ambiguous telemetry needs into a repeatable runbook with documented prerequisites.

Senior Data Engineer

NDA (AWS Serverless Lakehouse)
Led delivery with junior engineers, establishing code reviews, Python standards, and CloudFormation-based IaC templates for repeatable dev/prod environments

Senior Data Engineer

Avenga

Telecommunications

500-1000 team member

Senior Data Engineer, Avenga | Client: Makedonski Telekom | Oct 2025 - Dec 2025
Stabilized an inherited vendor lakehouse for a national telecom (15-30 sources, ~1 TB/day), establishing rerun-safe processing and predictable recovery protocols.
Implemented Data Vault 2.0 processing in Spark for consistent historization, applying hash keys, SCD2 logic, soft deletes, and Point-in-Time (PIT) patterns; hardened NiFi ingestion (batch and streaming) by implementing state management and schema evolution handling.

Senior Data Engineer

Robert Bosch Power Tools

Industry & Mechanical Engineering

500-1000 team member

Senior Data Engineer, Avenga | Client: Robert Bosch (Data Hub & Lakehouse) | Jul 2022 - Jul 2025
Extended the Bosch PT Data Hub framework (used by 50+ projects) with reusable Delta Lake ingestion and transformation patterns, standardizing downstream delivery across teams while enforcing budget limits via Azure Cost Management.
Implemented a centralized Device Directory lakehouse for 20+ factories, integrating ADF and Databricks to serve asset metadata via Azure Cognitive Search.
Engineered Databricks Autoloader pipelines for global factory telemetry (~1-5 TB/day), establishing a structured Bronze layer with lineage, batch metadata for traceability and ingestion tracking.
Delivered reusable PySpark components to standardize CDC, SCD1/2, and schema enforcement, replacing one-off notebooks with tested, reusable modules.

Senior Data Engineer

Contributor Client (Enterprise Data Platform)

Internet & IT

Data Engineer, Confidential Client (Enterprise Data Platform) | Sep 2020 - Mar 2022
Designed metadata-driven SCD framework (Type 1/2) in Databricks supporting multi-source ingestion (MS SQL, Oracle, PostgreSQL) with templated dynamic merge logic.
Built schema inference and drift detection modules for heterogeneous file types (CSV, Parquet, Avro, ORC) enabling automated validation and controlled propagation across Bronze-Silver-Gold layers.
Delivered dimensional EDW with composite key validation, relationship inference, and ADF-orchestrated ETL pipelines for analytics-ready datasets.

Data Engineer and Data Scientist

HomeTrust (Canadian Bank)

Banking & Financial Services

500-1000 team member

Data engineered multi-million-row analytical data set and built regression and classification models to identify self-employed mortgage prospects and credit card cross-sell targets

Contact form

Log in to get in touch

You need to be logged in to use the contact form.

Sign upLog in