The Cheesy Analogy of MLflow and Kubeflow

Byron Allen

Published in

Cognizant Servian

8 min readNov 6, 2019

This article assumes some background knowledge in data science and machine learning.

Hmmm… this spaghetti could use some cheese…

MLflow. Kubeflow.

Both are open-source projects. Both are supported by major players in the data analytics industry. MLflow is a Databricks project and Kubeflow is widely backed by Google. Both tools are touted as the next best thing since sliced bread when it comes to tracking ML experiments and supporting the production ML lifecycle. Both end in ‘flow’.

That said, what specifically are the key differences? Moreover, not every data scientist nor every data-driven organization is at the same phase of the ML journey, so how can they better understand where these two tools fit into their flow?

By way of analogy, it’s a bit like comparing a block of cheddar to cacio e pepe, which is a highly glorified mac and cheese made with the fancy stuff — parmesan and pecorino romano.

Both cheddar and cacio e pepe are delectable diary-based eats that hit that special spot in your stomach — especially if you’re lactose intolerant... But of course, they are different.

MLflow is a single python package that covers some key steps in model management. Kubeflow is a combination of open-source libraries that depends on a Kubernetes cluster to provide a computing environment for ML model development and production tools.

Each has its place.

Comparison Across the Value Chain

I discussed a ProductionML value chain in a previous article titled Scaling the Wall Between Data Scientist and Data Engineer. The diagram below outlines the primary activities of that value chain. Taking this as our foundation we are able to compare how Kubeflow and MLflow impact each primary activity.

ProductionML Value Chain Primary Activities

Collaborate

Both tools enable parameter, artifact, and model tracking to increase transparency and therefore the ability to collaborate in a team setting.

MLflow allows users to develop locally and track runs in a remote archive through a simple logging process — perfect for exploratory data analysis (EDA) and suitable for development work as well.

This is technically possible via Kubeflow Metadata, but the setup requires a level of DevOps savvy that many data science teams don’t have. Or alternatively, Kubeflow enables a notebook server located within its Kubernetes environment. This could be seen as an expensive route to EDA albeit very useful to create a more locked down EDA environment. Also, it could be valuable for developing production jobs albeit not completely required.

As a final point in this section, the Project format in MLflow offers an enhanced ability for data scientists to share and try other's projects with minimum hassle. It enables users to collaborate in a way that focuses on experimentation with a model. Kubeflow doesn’t appear to have the same capability.

Data Science

Both tools are enablers of data science and experimentation. Arguably, due to ease of collaboration and the enhanced Project format, MLflow might be a better enabler. MLflow is a lighter tool with a laser-sharp focus on tracking and archiving. It is a far more flexible tool for use in a greater number of situations. In a team or as an individual ML practitioner, MLflow brings value to the ML model development process with little fuss.

Data Pipeline Management

By design, Kubeflow sits on top of Kubernetes, which means components like hyperparameter tuning and pipelines along with auto-scaling nodes are available to run and scale up a data and ML pipeline. That said, Kubeflow pipelines capture the ‘last mile’ of the data pipeline. Prior steps in the data pipeline are completed by BigQuery, Dataproc, or containerized scripts. This might change, a little bit, if Feast is fully adopted into Kubeflow. Fundamentally, Kubeflow is an orchestration tool.

Here is where MLflow is going to fall short. By itself, MLflow doesn’t dramatically improve this activity. Sure it can track some vital information, but it doesn’t fundamentally facilitate data modeling or feature engineering. Databricks has other products that would address this, but they are not necessarily all open-source.

Model Management

Both offer the ability to archive metrics, parameters, and artifacts that might be critical to model management.

Kubeflow’s ability to retain and visually highlight the entirety of data and ML model pipelines, at-rest and in-flight, is very handy. Moreover, I love the scheduling capability, which is yet another example of Kubeflow’s orientation towards orchestration. MLflow doesn’t offer this albeit other components in the Databricks ecosystem might be able to do so.

Previously, I would have said there is no substantial governance mechanism in either product. However, MLflow launched a new feature called Model Registry in October 2018. Databricks says, “[It] builds on MLflow’s existing capabilities to provide organizations with one central place to share ML models, collaborate on moving them from experimentation to testing and production, and implement approval and governance workflows. ”

While I’d like to see more automation or triggers built on source data distributions, the Model Registry feature is a great step in the right direction. The fundamental objective of the model management stage is to bring a level of governance to ML model development. MLflow’s Model Registry inches closest to this objective.

Hosting

Both enable serving models at an API endpoint.

Kubeflow offers a collection of serving components along with the serving infrastructure via the Kubernetes cluster it sits on top of.

In contrast, MLflow’s offer includes the essentials — a REST API endpoint that requires a server — along with the ability to promote models to cloud environments such as AWS Sagemaker and Azure ML. The MLflow REST API endpoint might be useful if you didn’t want to use the API endpoint of a cloud vendor but the underlying server might require some development should that API be used frequently or at high velocity. In short, MLflow makes it far easier to promote models to API endpoints on various cloud vendors compared to Kubeflow, which can do this but only with more development effort.

Again, the difference is, largely, actual orchestration versus a tool to be leveraged by orchestration jobs.

My Experience

When I talk about Kubeflow it’s worth noting that my experience with it has been with the streamlined deployment via Google Cloud Platform and previously over a year ago when it was unpleasant to put it politely. When I first used it, I promptly dropped it in favor of MLflow’s maturity and ease of use. It’s great to see that Kubeflow has progressed and is evolving rapidly. It holds so much potential.

That said, from an ease-of-use perspective, Kubeflow doesn’t feel mature enough particularly for such a complex system. Moreover, it assumes a lot of competency with Kubernetes and/or containers, which frankly is great if you have that and disappointing if you don’t — not every data science team will. Kubeflow is a tool for a grin-and-bear-it intermediate or truly advanced team of ML engineers. It smacks of the Hadoop ecosystem that leaves a sarcastic smirk on one’s face should they have had the privilege of dealing with that ‘federation’ of components.

In contrast, MLflow is a lightweight and narrowly defined package that makes it resilient and versatile. It simply fits in where needed. As you can see from the above primary activities fit, MLflow doesn’t do as much as Kubeflow. Rather it needs to be coupled with other components to accomplish a similar scope, which is quite easy to do with other cloud components or via Databricks.

A Simple Pattern

The pattern I’ve used before relies on well-established tools — BigQuery, Jenkins, AI Platform ML Engine, and MLflow. I discuss this pattern in the ‘Scaling the Wall’ article I mentioned. That said, I used a rather complex-looking diagram that belies the simplicity of the pattern, so I’ll sum it up here with a new diagram.

This pattern effectively fills the orchestration role that is seen in Kubeflow and removes the Kubernetes overhead. There are certain GUI aspects that might be missing, but it provides an appropriate entryway into the world of production ML. It hits the mark on all the primary activities, is easily flexible to changing requirements, and provides a quick vehicle towards education around the process of production ML.

By minimising technical debt through the use of well-established tools and serverless Google Cloud offerings, the pattern is far easier for a data science and engineering team to build upon. I believe this pattern is valuable as an initial stepping stone. It can help organizations better understand what they value in systems like this, which can require a high degree of customization. It could easily be a gateway to Kubeflow particularly when that platform matures even more.

Bon Appétit?

MLFlow:

Easy collaboration on a remote or local environment for individual or teams
Simplified tracking for ML models means faster time to insights and value
Launch of model registry enhances governance and core proposition of model management
No native support for feature engineering, data pipeline development, and pipeline orchestration, meaning it requires other components
Essentials for hosting available and easy ability to push to cloud vendor API endpoints
Enables quick development of foundational ProductionML pattern with Databricks obviously or established cloud components

Kubeflow:

Knowledge of Kubernetes is required, yet provides team with direct control over infrastructure used for EDA, development, and production environment
Collaboration and general setup requires a level of skill and effort that reduces time to insights and value
Highly promising scalability and hyperparameter tuning
Strong scheduling and orchestration capabilities
Wide range of hosting components
Future prospect of feature engineering support

So, what do you want? A block of cheddar for some mac and cheese or maybe a humble sandwich, or Cacio e Pepe? One is versatile, great for beginners and even useful for experts. Much can be built with it. The other assumes a level of specification and maturity of skills to implement it.

Each has its plate…

Kubeflow is sophisticated. If you have a team with all the right skills, it might be for you. However, the pattern I demonstrate highlights the simplicity and robustness of MLflow, which are characteristics that make it flexible and durable.

Source: Pexels — yes, I know they are not Cheddar but they look cool, no?

Hungry yet?