What is MLOps?

Aug 25, 2022 · 5 min read

"MLOps" stands for Machine Learning Operations and is a discipline that aims to accelerate the adoption of artificial intelligence, machine learning, and data science into software systems.

A Brief History

The terms artificial intelligence, machine learning, and data science (Collectively referred to as "ML" throughout the rest of this article) started buzzing in popularity around 2008 and exploded into mania with the 2012 article, Data Scientist: The Sexiest Job of the 21st Century.

At that time, there was a limited awareness on how to best do ML in production at scale. These disciplines are highly specialized and can be challenging to deploy in software systems. It wasn't until 2019 when the term "machine learning operations" (MLOps) started to become a topic of discussion. Since then, there has been an ongoing evolution of tools and best practices as the field has continued to grow in maturity.

My mock-up of this illustration was inspired by ml-ops.org

What is MLOps?

The abbreviation "MLOps" stands for "Machine Learning Operations." The term "machine learning" itself is often used synonymously with artificial intelligence and data science, even though the three are distinct from one another.

MLOps is a separate discipline that aims to improve and accelerate the development and adoption of ML. Its role spans across the domains of data engineering, model engineering, and software engineering and aspires to do for ML what DevOps has done for software development. MLOps is a presently-evolving discipline that is being further defined each day. As that continues, there are several groups that are helping to tabulate and define MLOps, with pages like ml-ops.org, mlops.community, and the Linux Foundation's SIG - MLOps being great resources to learn more.

With the emergence of MLOps, there has been an increased importance placed on:

Speeding up the development of ML models and systems.
Automating the deployments of models into production.
Treating models as micro-services/components in software systems.
Model serving, performance monitoring, and logging.
Keeping data, models, and software all versioned and in sync with one another.

How do you do MLOps?

"MLOps [is a] language-, framework-, platform-, and infrastructure-agnostic practice. MLOps should follow a 'convention over configuration' implementation." - ml-ops.org

There isn't just one way of doing MLOps, but there is a set of best practices and tools that are beginning to take form. In this section, we'll start from a high-level perspective on MLOps and step further into its applications.

At a high-level, ml-ops.org advocates that the overall approach of MLOps resembles a three-phased infinity symbol (Shown below). MLOps increases the speed at which ML projects travel through these broad phases, starting from a project's initial design, to model development, all the way through its ongoing operations.

Illustrated by ml-ops.org

We can apply this high-level view of MLOps by combining it with the Cross-Industry Standard Process for Machine Learning with Quality Assurance (aka CRISP-ML(Q)). The CRISP-ML(Q) process model is a proposed reference for developing robust ML solutions. By nature, the development of these solutions is circular in their ideation and progression, as beautifully illustrated by Visenger. The previous MLOps phases encompass each step of the CRISP-ML(Q) process model.

Illustrated by ml-ops.org

With a final step into our view, the MLOps “Stack” can be mapped onto the previous diagram to highlight areas of the CRISP-ML(Q) process where MLOps is uniquely poised to aid in the development of ML systems.

Illustrated by ml-ops.org

“An optimal MLOps experience is one where Machine Learning assets are treated consistently with all other software assets within a CI/CD environment. Machine Learning models can be deployed alongside the services that wrap them and the services that consume them as part of a unified release process.” - Special Interest Group (SIG) - MLOps

Using the above image as a blueprint for creating ML systems is a winning strategy. However, to do so warrants the collaboration of many disciplines and practitioners. Organizations need the combined talents of business stakeholders, product leaders, data engineers, data scientists, machine learning engineers, and now MLOps engineers to create ML systems that work and last.

MLOps can aid this collaboration by offering a set of tools and practices throughout the CRISP-ML(Q) process model, such as:

Giving input at the beginning stages of ML projects to identify the (1) value proposition of the desired system and the ongoing participation of MLOps.
Developing strategies on the available (2) data sources and data versioning to be used.
Offering tools and infrastructure to allow quick and easy (3) data analysis and experiment tracking.
Creating and managing (4) feature stores where preprocessed input data can be stored and consumed for model training and serving.
Integrating (5) code repositories with (6) CI/CD pipelines to automate and extend the testing and validation of models. More details on these continuous methods can be found here.
Building tools and processes to (7) host model registries and version model artifacts for already-trained ML models.
Automating the (8) deployment of models to target environments and the quality thresholds needed to do so.
Facilitating triggers and consistent up-time for (9) model predictions relative to service level agreements.
Hosting the (10) monitoring of data, models, and applications used in ML systems.
Setting up of a (11) metadata store for ongoing tracking of model names, parameters, training data, test data, and metric results for retrained models.

The Future of MLOps

The crux to adopting ML systems has been the significant cost (i.e., Time, talent, infrastructure, etc.) that it takes to build them and the complexity of maintaining them to ensure their return on investment. Not only that, but there is also an abundance of Hidden Technical Debt in Machine Learning Systems. The significance of these costs is where MLOps comes in — To hasten the development of these systems, improve their quality, make them more maintainable for practitioners, and increase their visibility to end-users and stakeholders to ensure success.

MLOps is a presently-evolving discipline that will be further defined and refined in the coming years. Tune in here at jacoblyman.com for ongoing articles and updates on the MLOps space.

A series of posts will be coming out that will be based on experiential learning and research, like the following:

Choosing Your MLOps Stack

If you found any of my content helpful, please consider donating
using one of the following options — Anything is appreciated!