Profileimage by Aziz Ajrir Senior Data Scientist, NLP Specialist from Gambetta

Aziz Ajrir


Last update: 06.09.2022

Senior Data Scientist, NLP Specialist

Company: Lialab
Graduation: Master Data science
Hourly-/Daily rates: show
Languages: German (Elementary) | English (Limited professional) | French (Native or Bilingual)




Knime, métadonnées, NLP, application web, API, Python, PySpark, Azure blob, Azure Databricks, Git, GitHub, Scrum, Jira, Dash, Flask, phishing, Machine Learning, Random Forest, Jupyter, Teradata, SQL, Scikit-Learn , Pandas, détection d'objets, Deep Learning, OpenCV, Computer Vision, algorithme, GPU, OCR, Tesseract, Azure, Deeplearning, Autoencoders, Tensorflow, Keras, réseaux sociaux, Sentiment analysis, Jupyter Notebook, data analysis, Data quality, Power BI, Linux, Netezza

Project history

02/2019 - 07/2021
Data Scientist
1. Stakes: Several million euros of damage due to fraud via the addition of fraudulent
IBANs explained mainly by phishing.
2. Objectives : Development of a solution using Machine Learning to identify risky
IBANs using customer information, IBAN history, banking transactions and connection
data. The business expectation 1) a decrease in the false positive rate 2) an increase
in coverage.
3. Approach adopted: A binary classification was used. The Random Forest model
retained as optimal.
4. Result and contribution of the project: The coverage went from 20% to 32% and the
rate of false positives from 40% to only 5%. This met the expectations of the business.
The solution was already in place and based on traditional IT rules has deteriorated
over time. Our solution based on data science and more consistent featurization
with data updating and automated performance score control to avoid possible
detection degradation.
5. Deployment: The solution was developed in Python and made available to the
client under a web application developed with the FLASK framework. The graphical
interface allows the customer to intuitively consult the detection results with a daily
data refresh
6. The team was made up of a completely independent consultant on this project
and a project manager.
7. Technologies used: Python, Jupyter, Teradata, SQL, Dash, Flask, Scikit-Learn, Pandas.

* Video analysis. Confidential R&D project
1. Objective : The project consists of in analyzing the visual content of the videos and
extracting the information allowing the identification of abnormal behavior.
2. Tasks performed: Automatic anonymization of faces in the video.
Motion and object detection. Estimation of time spent by a person in front of a target,
Tracking, Etc. The confidentiality of the project requires us not to give more details on the
3. Models and technologies used: Pre-trained Deep Learning models have been adopted
for this need. Namely: YoloV3 and MobilNet-SSD, ResSSD. OpenCV: Computer Vision's
Python reference library. The algorithm has been implemented in a GPU environment.
4. The team was made up of a completely independent consultant on this project and a
project manager.

* Robotization of the analysis of credit files
1. Project objective: Design of an automation solution for processing loan application files.
2. The team: 2 consultants and a project manager
3. My tasks:
a. Automated scraping of financial reports on the sites of the establishments
b. Extraction of structured data using the OCR technique.
4. Models and technologies used: Python, Beautiful Soup, OCR, Tesseract, OpenCV.

* Analysis of Check Images (R&D project):
1. Objective: Development of a model using the automatic analysis of check scans
intended to lift, or not, the cashing reserve for check remittances placed in reserve.
2. The team was made up of two consultants and a project manager.
3. My tasks:
- Recognition and extraction of handwriting on checks.
- Creation of a dataset of images of numbers and handwritten words
- Similarity tests with target images
4. Models and technologies used: Python, Azure, OCR, Deeplearning, Autoencoders,
OpenCV, Tensorflow, Keras.

* E-reputation (NLP project)
1. Objective: Identification and analysis of online content referring to BPCE and its
2. The team was made up of a consultant, 2 trainees and a project manager.
3. Tasks:
- Web scraping of target content from press and social media sites
- Exploration and analysis of text content
- Sentiment analysis and criticism detection
- Categorization of content into topics
4. Technologies used: Python, Azure Databricks, Jupyter Notebook, NLP

09/2018 - 12/2018
Data Scientist
Air Liquide
* Project: Wide industry data analysis
a. Project Team : 2 Project Managers - 2 Senior Data Scientists / 1 Junior Data Scientist
b. Accomplished tasks :
1. Collect the customer's needs and develop use cases
2. Data quality & Data management
3. Design of Machine Learning models to predict the behavior of certain Indicators
(e.g. Cost of maintaining a site)
4. Segmentation of manufacturing sites to compare their indicators
5. Power BI Dashboard creation & presentation
c. technical environment : Python, Jupyter, Pandas, Scikit-Learn, Git, Linux, git, scrum

01/2018 - 07/2018
Data Scientist
* Customer Churn Analysis

a. Project Team : 1 Project Manager / 2 Data Scientists
b. Accomplished tasks :
1. Define the notions of Customer and Customer churn
2. Data exploration and preparation
3. Creation of features to characterize the churn and modeling
4. Organization of workshops with business experts
5. Industrialize the process of identifying and dealing with customer churn
c. technical environment : Python, Pandas, Scikit-Learn, Linux, Knime, Netezza

09/2015 - 11/2017
Data Scientist
ST Microelectronics

Contact form

Contact details