Keywords
Apache
Java
Android
Cassandra
JS
Python
Django
Jetty
Jersey
Java Script ES6
Node.js
MySQL
Apache Hadoop
MangoDB
ElasticSearch
CouchDB
memcached
Redis
HTML 5
d3.js
React
jQuery
Docker
Amazon AWS ecosystem
S3
Google App Engine
EC2
Linux server
Heroku
Express.js
NumPy
Data Scientist
scikit-learn
Feature engineering
machine learning
Kibana
Tableau
ELK
Amazon Quicksight
Amazon Web Services (AWS)
Amazon S3
AWS Glue
Spark
Map Reduce
Apache Kafka
NiFi
Apache Spark
Cloudera stack
Clustering
Information Retrieval
Skills
Backend: Python, Django, Java, Jersey, Jetty, Java Script ES6, Node.js, Express, Android, Cloudera/Hortonworks Stack, Apache NiFi, Apache Kafka, Apache Spark
Databases: MySQL, PostgreSQL Apache HBase, Apache Hadoop, MangoDB, CouchDB, ElasticSearch, ELK-Stack, Cassandra, Memcached, Redis
Frontend: HTML 5, React, Java Script ES6, jQuery, d3.js, Android, Kibana, Tableau, Amazon Quicksight
Deployment: Docker, Amazon AWS ecosystem, EC2, S3, Google App Engine, Heroku, Apache, Linux Server
Data Science: numpy, sci-kit-learn, sci-py, pandas, machine learning, feature engineering, natural language processing, clustering
Miscellaneous: Git, Svn, Jira, Confluence, Slack, Google Analytics, IoT
Databases: MySQL, PostgreSQL Apache HBase, Apache Hadoop, MangoDB, CouchDB, ElasticSearch, ELK-Stack, Cassandra, Memcached, Redis
Frontend: HTML 5, React, Java Script ES6, jQuery, d3.js, Android, Kibana, Tableau, Amazon Quicksight
Deployment: Docker, Amazon AWS ecosystem, EC2, S3, Google App Engine, Heroku, Apache, Linux Server
Data Science: numpy, sci-kit-learn, sci-py, pandas, machine learning, feature engineering, natural language processing, clustering
Miscellaneous: Git, Svn, Jira, Confluence, Slack, Google Analytics, IoT
Project history
07/2019
-
09/2020
Big Data Solution Architect & Information Retrieval Specialist
DAX30 group (Ludwigshafen am Rhein)
(>10.000 employees)
Automotive and vehicle construction
Conceptualization, architecture and development of a scalable Big-Data solution (as a R&D Datalake use case) for mass indexing file contents using bleeding edge natural language processing and machine learning algorithms on the Cloudera Hadoop Stack (HDP and HDF), Palantir Foundry and Kubernetes Deployment in Microsoft Azure.
Technologies used
Technologies used
- NiFi and MiNiFi for ETL
- Apache Spark Processing (Java, Scala & Python)
- Apache HBase
- Elasticsearch Stack
- Django Backend
- React Frontend
- Raw text extraction from various file types
- Language dependent indexing
- Clustering Approaches (i.a. Latent Dirichlet Allocation, Latent Semantic Indexing, doc2vec)
- Parse unstructured data into structured data
- Named Entity Recognition (i.a. chemical entities) using Neural Networks
- Entity Linking (Distant Knowledge)
- Molecular Substructure Search
Local Availability
Available worldwide