available

Last update: 10.09.2019

Machine learning engineer / architect

Graduation: not provided

Hourly-/Daily rates: show

Languages: Arabic (Native or Bilingual) | English (Native or Bilingual)

Keywords

machine learning engineer / architect Scala Data Management data lake ML feature store

Attachments

Nick Vintila resume 201901.pdf

Skills

Nick Vintila
original (best read): https://drive.google.com/file/d/1oOdS28RUgoobrIRlGLVeY82z6cQbLcyZ/view

Hands-on, full stack software architect & engineer, computer scientist, machine learning engineer, domain & data analyst, researcher, entrepreneur, lead
Extensive industry experience building complex custom software for multiple complex business domains
5y Bsc in Computer Science, Mathematics, Machine learning and Statistics

nick@semanticbeeng.com
https://www.linkedin.com/in/semanticbeeng
https://twitter.com/semanticbeeng
Key Value Proposition - beyond "data science" to "machine learning engineering"

Base premise:
1. The experimental data science paradigm is being challenged by complexity in enterprise settings
2. Source: https://leon.bottou.org/slides/2challenges/2challenges.pdf
(definition) Experimental data science
1. Exploratory data discovery & analysis
2. Not secure
  1. May violate regulations even if only done in DEV-ENV
  2. Forbids sharing across corporate boundaries which defeats business needs to develop partnerships
3. Uncharted waters to PROD-ENV - visual metaphor below
4. Development mostly in notebooks, scripts, etc
5. Ad-hoc choice of tools
6. Batch analyses only, & "one off", manual execution
7. Relatively small data, manual sampling, often without proper treatment of "time"
8. Models trained offline
9. No "team process", continuous integration, etc
10. No means to reproduce analysis results or software releases
11. End result are reports, diagrams, etc
(definition) Machine learning engineering
1. End results are enterprise data science applications - aka #DataProducts
2. Supported by #DataManagement with business #DataGovernance for "business understanding" of data from day 1 of the project and after release of the product
3. Use of realistic, multi-dimensional data set`s that may not fit in RAM or a single machine, logically infinite
4. Proper treatment of data #DistributionDrift (#DatasetShift), #CovariateShift and #ConceptDrift - both detection and adaptation
5. Use of managed data sets : data lakes, feature stores
6. Models trained incrementally
  1. the algorithms / analysis are incremental (online normalization , etc)
  2. offline periodically and refreshed / deployed every few hours/days as suitable
7. Use complex orchestration between core analysis and decision layer, model monitoring and other application logic and business processes, some involving human interactions
8. Integrated with the software engineering process
9. Light but strict architecture decision management
10. Strict management of technology dependencies

Have developed mission critical, custom distributed business application for over twenty years in different business domains.
Over the past five years have masterminded and built two data science platforms and evolved an approach, vision, methodology, reusable design decisions / solutions, electronic knowledge and reusable pieces of solution.

https://www.yields.io
- "model validation and real-time monitoring on an enterprise-wide scale"
- masterminded the overall product, developed large parts of the code and lead teams
https://www.datawatch.com/in-action/angoss/
- helped re-engineering a proprietary data science platform to modern technologies and architecture.

To address the base premise above, the biggest hurdles to further streamline the #DataProduct development solution and approach are:

Managing business understanding of data
1. with business and product management #DataCitizen-s
2. across the project lifecycle
3. to achieve self service business capability in the final #DataProduct
4. including ability to do #ModelValidation, audit against regulation (GDPR), etc
Achieving proper #ModelManagement in production in context of #NonStationaryLearning (#DistributionDrift / #DatasetShift, #CovariateShift and #ConceptDrift)
Managing incidental code complexity due to ad-hoc combination of languages and technologies but also less than mature programming paradigms & styles
Achieving effective cross-language and cross platform integration ( #DataFabric, #ProgrammingModel, etc)
Combine data analyses with traditional enterprise technologies (microservices, process managers, etc)

The ongoing actions to further and better solution these challenges are:

Perform systematic, hands-on research into technologies, designs, algorithms, etc
Maintain a large wiki with structured, reusable knowledge which allows me to refine with every iteration and apply to new projects
Maintain a network of connections from both industry and academia (UberEng, Lightbend, KTH University, language design experts and more)
Seek to apply
1. category theory for cross language / cross platform treatment of data schema and lifting of "data munging" to a higher level for better "business understanding" in the implementation
2. process management for orchestration between data analyses and business processes elsewhere in the #DataProduct.
3. incremental, non-stationary machine learning algorithms.

Key Value Proposition - domain specific architecture

There is a growing number of technologies available, each with its own implicit requirements and design choices
They provide reusable capabilities which often overlap and conflict
There is an objective and material difficulty in bridging between the functionality of generic software and business domain specific needs
One of the problems many software development efforts face is the constant friction introduced by translation between two technical vocabularies, that of the business domain on the one hand and that of the developers on the other.
source: https://www.agilealliance.org/glossary/ubiquitous-language
Trying to apply generic technologies to specific business needs without managing the "gap" between them is very wasteful and expensive at best and risks failure at worst
Crafting sustainable and affordable custom software business solutions
- requires careful and principled matching between specific business domain and the technology domains of the various technologies
- requires a detailed architecture process, expert knowledge in the business domain, data science expertise and extensive engineering to combine them all and operationalize
- must involve restricting use of technologies to reduce entropy, choosing between overlapping capabilities and resolving architecture mismatches

I offer the reusable knowledge, designs, methodology and technologies to create state of the art custom business domain specific software that fits.
Have successfully applied the approach to large scale development in diverse business domains:

experience on large projects
extensive research and state of the art methods and
command of technologies Lightbend, Thoughtworks, UberEng, Databricks, etc

How it relates to key stakeholders:
✔ To client

Help creating a data-driven organization & process that scale
Help creating mission critical enterprise applications / platforms

✔ To management consulting companies & client internal product management

Complement your business strategy on a client project with advanced architecture and software development expertise
Mastermind the solution design, development and operationalization
Reduce risks from building an engineering team that is disconnected from business realities using a knowledge-centric SDLC
Boost execution and reduce burden of developing a productive team
Ensure the team grows well suited to the concrete solution architecture and has the maturity required to build it

✔ To internal engineering team

Ease access to the business domain knowledge required to build so that you have more time to build instead of being distracted in meetings
Partner with you to help manage overall solution architecture through disciplined architecture and design decisions
Streamline the SDLC so you waste less time dealing with poorly developed solutions by external parties

✔ To recruiting & HR management firms

Act as a partner, instead of an isolated resource
Provide insights needed to build a team that actually suits the product being built
Help unify the team under well understood goals & objectives to reduce team storming and better partner with the domain experts
Reuse what works thus getting leverage with multiple clients

✔ To technology vendors

Solution on client projects using deep knowledge of your tools and the client "needs behind the requirements"
Act seriously motivated by the value proposition of the technology you offer
Go way beyond a sales mindset
Aim to reduce your sales cycle and consulting efforts
Use a knowledge-centric organization mindset & techniques to make communication very effective
Provide learning & visibility from client projects that might influence product direction

Details of the approach
The key pillars of the approach are below.
For each project we apply combinations of these techniques as appropriate.

Data science applications
- Not just the experimental data science part of the process.
- But also the production applications, platforms, deployment, etc
- And capabilities like: data lake, feature store, data & model management, business understanding, incrementality (data streaming, windowing), separation of model from business strategy.
Machine learning engineering
- Not just machine learning model development
- But also the application deployment, model serving, incremental re-training, distributed & federated training, concept drift detection and adaptation, prequential model evaluation, etc.
- And the platform mindset, beyond the "data pipeline", to reach scalability in both functionality and at runtime.
Advanced, type level, functional programming to manage complexity
- Not just a mix of general programming languages - Scala, Python, Java, Rust, R
- Or general frameworks - Tensorflow, Pytorch, etc
- But also a paradigm how to achieve effective interoperability through disciplined #ProgrammingModel and #SystemsProgramming for algorithms reusable and running of a shared runtime.
Domain Driven Design - powerful impact for data science applications as well and immersion in business domain.
- Not just pipelines and data lakes
- But also a domain specific approach to managing business complexity in data and code
- With semantic data mapping in order to bring data under business management.

▶ Distributed software full stack architect & engineer
Have built complex enterprise software for a number of industries, in close partnership with business and other developers.
Last four years all remote projects:

data science platform (Canada)
large e-commerce implementation (Dubai)
data science platform for risk management (Belgium)
security platform (Netherlands)

The resume speaks plenty about it so will not replicate here, hoping to have the chance to discuss in detail.
Over the last 18 years have created quality software by designing in detail across the entire code base, stack and using hundreds of technologies from DevOps to UI.
▶ Domain & data analyst
Have an inclination and proven effectiveness for tackling complex business domains through critical thinking, discipline and knowledge management which extends my natural abilities with technology and benefits the overall SDLC on both the business and technical side of the stakeholders.
For data science applications this means applying data management at a few levels

semantic data management
1. best practice : bring data under management from a "business understanding" perspective
2. best practice : Develop business catalogs to clarify business terminology
logical & physical raw data management ( #DataOps, #MLOps)
- best practice : Federate first, copy later, maybe never
- Develop data lake as a place of reference for data under analysis.
- Develop feature store to continuously extract features from raw data for model training and model serving
- All data management us done
  - for productionization - beyond the experimental data science mindset
  - with state of the art technologies and
  - advanced, proven designs evolved over four years of designing and building solutions
  - to reduce waste and total cost of ownership
  - to streamline operations
  - in a way that is reproducible to build both development & production environments (#DevOps)

I trust this ability will be of great value to your work.
▶ Machine learning engineer
My background is 5yr BS in math, computer science, statistics and machine learning.
My practice with ML is focused on data of type time series, text and graph.
By "applying engineering to ML" I mean going into the details of algorithms and the data with the mind of distributed software developer in order to scale them and mix into applications.
I seek to apply advanced functional programming for more powerful and scalable abstractions on top of technology (Akka Actors, Akka Streams, Kafka, etc).
See some thoughts here

http://pchiusano.blogspot.ro/2010/01/actors-are-not-good-concurrency-model.html?showComment=1473166161907#c8759443976737604755
https://twitter.com/semanticbeeng/status/914101334688792576
https://twitter.com/semanticbeeng/status/904360996155904001
https://twitter.com/semanticbeeng/status/901345659365752832
https://twitter.com/semanticbeeng/status/858966500455133188

The models and alogs I focused on lately are HMM, word embeddings, loopy belief propagation, PGM, Bayesian networks.
Passionate explorer of comparisons between frequentist and Bayesian statistics (far from mastery)
Able to apply elements of functional programming to the domain of quant finance
See thoughts here

https://twitter.com/semanticbeeng/status/901448876296728576

Comfortable working with Mathematica to study models and advance my knowledge of statistics.
▶ Researcher
Have an elaborate research plans covering 20K Evernotes, huge wiki with digested knowledge, hundreds of topics and plans.
Passionate about

mapping meaning of data
cross programming language and cross technology interoperability
advance, type level functional programming (Scala)
and applications to data science applications to create clean code close to algorithms
machine learning engineering
domain specific languages
probabilistic programming
probabilistic machine learning models
concept drift and adaptation
distributed software design

Conclusion
I trust that this comprehensive & multi-dimensional knowledge could give me you an unique edge in pursuit of your goals.
Hope to have a chance to discuss in detail about your goals and what I might be able to contribute.

Local Availability

Only available in these countries: Netherlands

Machine learning engineer / architect

Machine learning engineer / architect

Keywords

Attachments

Upgrade your account now

Skills

Project history

Local Availability

Follow profile

Follow profile

Welcome to freelancermap!