● Create from scratch a big data solution that processed around 900 millions
spatial-temporal records per day
● Create the cloud infrastructure and pipelines to support the solution using Azure as
cloud provide: Hbase cluster, Spark cluster, Kafka cluster , Azure Blob, Datafactory,
Databricks
● Used GEOMESA as spatial-temporal framework solution (after researching into other
solutions)
● Ingestion Spark pipelines from different sources: raw files, Kafka, Parquet and
databases synchronizations with: MongoDB, ArangoDB, ElasticSearch
● Complex query and analytics using Spark, SparkML and Kafka
● Create knowledge graph using Spark and ArangoDB
● Image and text analysis pipelines using ML, Spark and Kafka
● Poc’s around other graph databases: Neo4J, JanusGraph & Gaffer
● Create custom docker infrastructure to support development on all aspects
● Backend and Front-end development to support POC’s or small changes in
production: Python, TypeScript, NodeJS, Springboot, Javascript and Angular
● Monitoring using: Prometheus, Grafana and Jaeger