NV
available

Last update: 10.09.2019

Machine learning engineer / architect

Graduation: not provided
Hourly-/Daily rates: show
Languages: Arabic (Native or Bilingual) | English (Native or Bilingual)

Attachments

Nick Vintila resume 201901.pdf

Skills

Nick Vintila
original (best read): 
https://drive.google.com/file/d/1oOdS28RUgoobrIRlGLVeY82z6cQbLcyZ/view 
  • Hands-on, full stack software architect & engineer, computer scientist, machine learning engineer, domain & data analyst, researcher, entrepreneur, lead
  • Extensive industry experience building complex custom software for multiple complex business domains
  • 5y Bsc in Computer Science, Mathematics, Machine learning and Statistics
nick@semanticbeeng.com 
https://www.linkedin.com/in/semanticbeeng 
https://twitter.com/semanticbeeng
Key Value Proposition - beyond "data science" to "machine learning engineering"
  1. Base premise:
    1. The experimental data science paradigm is being challenged by complexity in enterprise settings 
    2. Source: https://leon.bottou.org/slides/2challenges/2challenges.pdf
  2. (definition) Experimental data science 
    1. Exploratory data discovery & analysis
    2. Not secure
      1. May violate regulations even if only done in DEV-ENV
      2. Forbids sharing across corporate boundaries which defeats business needs to develop partnerships
    3. Uncharted waters to PROD-ENV  - visual metaphor below
    4. Development mostly in notebooks, scripts, etc
    5. Ad-hoc choice of tools
    6. Batch analyses only, & "one off", manual execution
    7. Relatively small data, manual sampling, often without proper treatment of "time"
    8. Models trained offline
    9. No "team process", continuous integration, etc
    10. No means to reproduce analysis results or software releases
    11. End result are reports, diagrams, etc
  3. (definition) Machine learning engineering
    1. End results are enterprise data science applications - aka #DataProducts
    2. Supported by #DataManagement with business #DataGovernance for "business understanding"  of data from day 1 of the project and after release of the product
    3. Use of realistic, multi-dimensional data set`s that may not fit in RAM or a single machine, logically infinite
    4. Proper treatment of data #DistributionDrift (#DatasetShift), #CovariateShift and #ConceptDrift  - both detection and adaptation
    5. Use of managed data sets : data lakes, feature stores
    6. Models trained incrementally
      1. the algorithms / analysis are incremental (online normalization , etc)
      2. offline periodically and refreshed / deployed every few hours/days as suitable
    7. Use complex orchestration between core analysis and  decision layer, model monitoring and other application logic and business processes, some involving human interactions
    8. Integrated with the software engineering process
    9. Light but strict architecture decision management
    10. Strict management of technology dependencies
Have developed mission critical, custom distributed business application for over twenty years in different business domains.
Over the past five years have masterminded and built two data science platforms and evolved an approach, vision, methodology, reusable design decisions / solutions, electronic knowledge and reusable pieces of solution.
  • https://www.yields.io 
    • "model validation and real-time monitoring on an enterprise-wide scale"
    • masterminded the overall product, developed large parts of the code and lead teams
  • https://www.datawatch.com/in-action/angoss/ 
    • helped re-engineering a proprietary data science platform to modern technologies and architecture.
To address the base premise above, the biggest hurdles to further streamline the #DataProduct development solution and approach are:
  1. Managing business understanding of data
    1. with business and product management #DataCitizen-s
    2. across the project lifecycle
    3. to achieve self service business capability in the final #DataProduct
    4. including ability to do #ModelValidation, audit against regulation (GDPR), etc
  2. Achieving proper #ModelManagement in production in context of #NonStationaryLearning (#DistributionDrift / #DatasetShift, #CovariateShift and #ConceptDrift)
  3. Managing incidental code complexity due to ad-hoc combination of languages and technologies but also less than mature programming paradigms & styles
  4. Achieving effective cross-language and cross platform integration ( #DataFabric, #ProgrammingModel, etc)
  5. Combine data analyses with traditional enterprise technologies (microservices, process managers, etc)
The ongoing actions to further and better solution these challenges are:
  1. Perform systematic, hands-on research into technologies, designs, algorithms, etc
  2. Maintain a large wiki with structured, reusable knowledge which allows me to refine with every iteration and apply to new projects
  3. Maintain a network of connections from both industry and academia (UberEng, Lightbend, KTH University, language design experts and more)
  4. Seek to apply 
    1. category theory for cross language / cross platform treatment of data schema and lifting of "data munging" to a higher level for better "business understanding" in the implementation
    2. process management for orchestration between data analyses and business processes elsewhere in the #DataProduct.
    3. incremental, non-stationary machine learning algorithms.

Key Value Proposition - domain specific architecture
  • There is a growing number of technologies available, each with its own implicit requirements and design choices
  • They provide reusable capabilities which often overlap and conflict
  • There is an objective and material difficulty in bridging between the functionality of generic software and business domain specific needs
  • One of the problems many software development efforts face is the constant friction introduced by translation between two technical vocabularies, that of the business domain on the one hand and that of the developers on the other.
    source: https://www.agilealliance.org/glossary/ubiquitous-language
  • Trying to apply generic technologies to specific business needs without managing the "gap" between them is very wasteful and expensive at best and risks failure at worst
  • Crafting sustainable and affordable custom software business solutions 
    • requires careful and principled matching between specific business domain and the technology domains of the various technologies
    • requires a detailed architecture process, expert knowledge in the business domain, data science expertise and extensive engineering to combine them all and operationalize
    • must involve restricting use of technologies to reduce entropy, choosing between overlapping capabilities and resolving architecture mismatches
I offer the reusable knowledge, designs, methodology and technologies to create state of the art custom business domain specific software that fits.
Have successfully applied the approach to large scale development in diverse business domains:
  • experience on large projects
  • extensive research and state of the art methods and
  • command of technologies Lightbend, Thoughtworks, UberEng, Databricks, etc

How it relates to key stakeholders:
✔ To client
  1. Help creating a data-driven organization & process that scale
  2. Help creating mission critical enterprise applications / platforms
✔ To management consulting companies & client internal product management
  1. Complement your business strategy on a client project with advanced architecture and software development expertise
  2. Mastermind the solution design, development and operationalization
  3. Reduce risks from building an engineering team that is disconnected from business realities using a knowledge-centric SDLC
  4. Boost execution and reduce burden of developing a productive team
  5. Ensure the team grows well suited to the concrete solution architecture and has the maturity required to build it
✔ To internal engineering team
  1. Ease access to the business domain knowledge required to build so that you have more time to build instead of being distracted in meetings
  2. Partner with you to help manage overall solution architecture through disciplined architecture and design decisions
  3. Streamline the SDLC so you waste less time dealing with poorly developed solutions by external parties 
✔ To recruiting & HR management firms
  1. Act as a partner, instead of an isolated resource
  2. Provide insights needed to build a team that actually suits the product being built
  3.  Help unify the team under well understood goals & objectives to reduce team storming and better partner with the domain experts
  4. Reuse what works thus getting leverage with multiple clients
✔ To technology vendors
  1. Solution on client projects using deep knowledge of your tools and the client "needs behind the requirements"
  2. Act seriously motivated by the value proposition of the technology you offer
  3. Go way beyond a sales mindset
  4. Aim to reduce your sales cycle and consulting efforts
  5. Use a knowledge-centric organization mindset & techniques to make communication very effective
  6. Provide learning & visibility from client projects that might influence product direction
Details of the approach
The key pillars of the approach are below.
For each project we apply combinations of these techniques as appropriate.
  1. Data science applications
    • Not just the experimental data science part of the process.
    • But also the production applications, platforms, deployment, etc
    • And capabilities like: data lake, feature store, data & model management, business understanding, incrementality (data streaming, windowing), separation of model from business strategy.
  2. Machine learning engineering
    • Not just machine learning model development 
    • But also the application deployment, model serving, incremental re-training, distributed & federated training, concept drift detection and adaptation, prequential model evaluation, etc.
    • And the platform mindset, beyond the "data pipeline", to reach scalability in both functionality and at runtime.
  3. Advanced, type level, functional programming to manage complexity
    • Not just a mix of general programming languages - Scala, Python, Java, Rust, R
    • Or general frameworks - Tensorflow, Pytorch, etc
    • But also a paradigm how to achieve effective interoperability through disciplined #ProgrammingModel and #SystemsProgramming for algorithms reusable and running of a shared runtime.
  4. Domain Driven Design - powerful impact for data science applications as well and immersion in business domain.
    • Not just pipelines and data lakes 
    • But also a domain specific approach to managing business complexity in data and code
    • With semantic data mapping in order to bring data under business management.
▶ Distributed software full stack architect & engineer
Have built complex enterprise software for a number of industries, in close partnership with business and other developers.
Last four years all remote projects:
  • data science platform (Canada)
  • large e-commerce implementation (Dubai)
  • data science platform for risk management (Belgium)
  • security platform (Netherlands)
The resume speaks plenty about it so will not replicate here, hoping to have the chance to discuss in detail.
Over the last 18 years have created quality software by designing in detail across the entire code base, stack and using hundreds of technologies from DevOps to UI.
▶ Domain & data analyst
Have an inclination and proven effectiveness for tackling complex business domains through critical thinking, discipline and knowledge management which extends my natural abilities with technology and benefits the overall SDLC on both the business and technical side of the stakeholders.
For data science applications this means applying data management at a few levels
  • semantic data management
    1. best practice : bring data under management from a "business understanding" perspective
    2. best practice : Develop business catalogs to clarify business terminology 
  • logical &  physical raw data management ( #DataOps, #MLOps)
    • best practice : Federate first, copy later, maybe never
    • Develop data lake as a place of reference for data under analysis.
    • Develop feature store to continuously extract features from raw data for model training and model serving
    • All data management us done
      • for productionization - beyond the experimental data science mindset
      • with state of the art technologies and
      • advanced, proven designs evolved over four years of designing and building solutions
      • to reduce waste and total cost of ownership
      • to streamline operations
      • in a way that is reproducible to build both development & production environments (#DevOps)
I trust this ability will be of great value to your work.
▶ Machine learning engineer
My background is 5yr BS in math, computer science, statistics and machine learning.
My practice with ML is focused on data of type time series, text and graph. 
By "applying engineering to ML" I mean going into the details of algorithms and the data with the mind of distributed software developer in order to scale them and mix into applications. 
I seek to apply advanced functional programming for more powerful and scalable abstractions on top of technology (Akka Actors, Akka Streams, Kafka, etc). 
See some thoughts here
  1. http://pchiusano.blogspot.ro/2010/01/actors-are-not-good-concurrency-model.html?showComment=1473166161907#c8759443976737604755
  2. https://twitter.com/semanticbeeng/status/914101334688792576
  3. https://twitter.com/semanticbeeng/status/904360996155904001
  4. https://twitter.com/semanticbeeng/status/901345659365752832
  5. https://twitter.com/semanticbeeng/status/858966500455133188
The models and alogs I focused on lately are HMM, word embeddings, loopy belief propagation, PGM, Bayesian networks.
Passionate explorer of comparisons between frequentist and Bayesian statistics (far from mastery)
Able to apply elements of functional programming to the domain of quant finance
See thoughts here
  1. https://twitter.com/semanticbeeng/status/901448876296728576
Comfortable working with Mathematica to study models and advance my knowledge of statistics.
▶ Researcher
Have an elaborate research plans covering 20K Evernotes, huge wiki with digested knowledge, hundreds of topics and plans.
Passionate about
  • mapping meaning of data
  • cross programming language and cross technology interoperability
  • advance, type level functional programming (Scala)
  • and applications to data science applications to create clean code close to algorithms
  • machine learning engineering 
  • domain specific languages
  • probabilistic programming
  • probabilistic machine learning models
  • concept drift and adaptation
  • distributed software design
Conclusion
I trust that this comprehensive & multi-dimensional knowledge could give me you an unique edge in pursuit of your goals.
Hope to have a chance to discuss in detail about your goals and what I might be able to contribute.

Project history

Local Availability

Only available in these countries: Netherlands
Profileimage by Anonymous profile, Machine learning engineer  / architect Machine learning engineer / architect
Register