Profileimage by Robert Pohnke Big Data & Cloud Architect from

Robert Pohnke


Last update: 02.03.2022

Big Data & Cloud Architect

Graduation: B.Sc. Computer Science, Warsaw University
Hourly-/Daily rates: show
Remote projects, Negotiable,
Languages: German (Limited professional) | English (Full Professional) | Polish (Native or Bilingual)




Experienced Big Data Architect working as a freelancer since 2015 for large European customers including Schaeffler, Nordea, BNP Paribas, E.On, Adidas, Essity, DXC, Allianz.

Extremely hands-on & cross-functional within Big Data, Cloud (Azure & AWS), Data Engineering, DevOps & MLOps space. Built and led teams of developers. Experience with greenfield projects, architecture design & delivery.

Certified in Microsoft Azure & Spark. Programming languages: Python, Java, Scala. More details in resume. References available upon request.

Project history

07/2021 - 10/2021
Big Data Consultant
{ responsible for transforming harmonized data layer into consumption layer for Fleet project;
{ responsible for data mapping, ETL and Fleet portal back-end;
{ technologies used - PostgreSQL, Azure (AKS, Key Vault, Blob Storage), Kafka.

01/2021 - 05/2021
Big Data Consultant
Publicis Groupe
{ troubleshooted performance problems with Azure Databricks notebooks and tuned cluster settings;
{ implemented historical data preprocessing procedures using Azure Functions that transformed over 6TB of data;
{ introduced Databricks SQL Analytics as a reporting layer for BI;
{ technologies used - Azure (Data Lake Storage, Data Factory, Key Vault, Databricks, Functions).

08/2018 - 01/2021
Big Data Architect
{ part of a team designing and implementing a data lake based on Azure to ingest data from production plants, reporting
systems and other internal sources;
{ industrialized Python and R machine learning models and deployed to Azure Databricks (Spark 2.4), Data Science VM,
ML Workspace, AKS, Azure Batch;
{ introduced guidelines for data scientists working in Python and R, set up Jupyter notebooks, created dockerized
flask+scikit-learn and R Shiny environments;
{ created CI/CD Jenkins pipelines to AKS, Databricks and Azure Batch;
{ implemented near real-time messaging ETL and CDC pipelines in Data Factory;
{ technologies used - Azure (Data Lake Storage, Data Factory, Batch, AKS, ACR, Key Vault, SQL DW, Databricks,
Machine Learning Workspace, Data Science VM, ARM, IoT Hub, VNet), NATS, Cloudbreak, Python 2.7, 3.6, Jupyter

04/2020 - 12/2020
Big Data Consultant
{ created CI/CD pipelines to enable ML notebook versioning and deployment in Databricks and ML workspace;
{ improved ML workflow using ML workspace;
{ added ML model tracking in MLflow and Azure ML workspace;
{ participated in migration efforts from Data Lake V1 into Data Lake V2 based on ADLS Gen2;
{ implemented ETL pipelines in Data Factory;
{ technologies used - Azure (DevOps, Databricks, Machine Learning Workspace, ADLS, Data Factory, Key Vault), Python

04/2019 - 12/2019
DevOps Specialist
{ designed and implemented a framework for testing and rolling out schema updates to Exasol tables across multiple
projects and environments;
{ participated in architecture discussions and code reviews;
{ introduced guidelines for database developers working in Python;
{ created dockerized Exasol environments and CI/CD Jenkins pipelines;
{ implemented convention and regression tests using pytest;
{ technologies used - Bash, Python 3.6 (pytest, pylint, black, sqlparse, pydocker, pyexasol), Makefile, Jenkins, Exasol.

11/2017 - 08/2018
Data Lake Architect
BNP Paribas
{ involved in architecture and implementation of a centralized data lake for all enterprise sensitive data in the bank;
{ developed Ansible playbooks for cluster creation with Ambari;
{ created Bash scripts for automation and task scheduling;
{ configured and secured HDP clusters;
{ technologies used - Confluent Kafka 1.0, HDP 2.6 (HDFS, YARN, Zookeeper, Atlas, Ranger, Hive, HBase), Spark 2.1,
Scala, Ansible.

02/2018 - 04/2018
Big Data Consultant
Daimler (DXC)
{ part of a team responsible for architecture and implementation of a data lake storing sensor data from autonomous
{ developed ROS bag format Hadoop file reader;
{ participated in code reviews;
{ technologies used - MapR, Java 1.7, MapReduce, Spark.

01/2017 - 10/2017
Big Data Consultant
{ part of a team responsible for architecture and implementation of a data lake storing transaction and account history;
{ developed analytics Spark jobs that produced reports to a suite of mainframes;
{ developed ETL pipelines in Spark, Flume and Oozie to ingest data into the data lake;
{ configured and secured clusters, implemented business critical and resilient Oozie workflows in production;
{ technologies used - Kafka 0.10, CDH 5.9, Spark 1.6 (SQL, Streaming), Flume, HBase, Hive, HDFS, Oozie, Zookeeper,

06/2016 - 12/2016
Big Data Architect
{ responsible for architecture and implementation of ETL and ML pipelines in Spark;
{ ingested sensor data from power plant assets via Kafka into OpenTSDB;
{ performed data quality checks and missing data imputation;
{ responsible for distributed training and serving of ML models to generate real-time forecast for wind park power output;
{ technologies used - Kafka 0.8, Spark 1.6 (Streaming, MLlib, SQL), Cloudera Hadoop, OpenTSDB, HDFS, Oozie, Scala,
Python (scikit-learn).

11/2015 - 05/2016
Data Engineer
{ responsible for re-writing social media data acquisition pipelines using Spark Streaming, Kafka and AWS (Athena,
{ responsible for re-writing analytics platform using Spark 1.5 and Python 2.7;
{ working remotely in a distributed team leveraging Agile methodology and Slack.

10/2015 - 03/2016
Big Data Engineer
{ designed and implemented a proof of concept for a machine learning model matching data from medical institutions with
internal drug databases to generate sales reports for pharmaceutical companies;
{ technologies used - Spark 1.5 (ML, SQL), Scala, AWS (S3, EC2, Redshift), Docker.

07/2014 - 10/2015
Director of Engineering Services
{ mostly involved as a solution architect;
{ designed platform's REST APIs and prepared technical documentation;
{ developed platform's modules (Scala, Spark 1.3, Hadoop, Akka, ScalaTest);
{ set out programming guidelines for the development team;
{ set up Jenkins CI/CD environment and automated ML model deployment to AWS;
{ conducted code reviews and technical interviews;
{ acquired new clients - meeting key stakeholders and delivering presentations/demos world-wide;
{ organized and led machine learning and Spark training sessions in San Francisco, New York and London;
{ promoted the company on conferences (Hadoop Summit, Spark Summit, Strata + Hadoop World) and through lectures
to STEM students.

06/2013 - 08/2013
Software Engineer (intern)
Goldman Sachs
{ performed memory profiling of a strategic clearing engine and accurately identified the cause for high memory consumption
(VisualVM, JProfiler);
{ successfully implemented, tested and launched a lazy-load cache (Java 1.7);
{ performed JVM tuning, reducing memory consumption by 85% and boosted processing time by 15%.

05/2013 - 06/2013
Software Engineer
{ development of IBM Netezza Analytics data warehouse;
{ implemented scalable machine learning algorithms and parallel matrix computation modules (C++, PL/SQL);
{ developed automated functional and performance test suites (C++).

06/2011 - 09/2011
Software Engineer (intern)
{ developed an end-to-end solution for automating ledger deployments on UBS' private cloud (RPM, Bash, vSphere suite);
{ created a virtual appliance deployable via self-provisioning platform and launched UBS' first app on a 3rd party
infrastructure (vCloud Director, ServiceMesh);
{ replaced legacy object storage and messaging servers with a distributed cache (Java 1.6, JMS, Hibernate, Gigaspace

Contact form

Contact details