Kubeflow In 2024, major improvements and integrations (Kubeflow 1.9)

Wajeeh Ul Hassan
4 min readSep 5, 2024

--

Kubeflow in 2024 — Kubeflow 1.9

Kubeflow has come a long way since its inception. In the world of open source, the possibilities are endless. Kubeflow started as an ambitious project trying to create a scalable platform for machine learning on Kubernetes which is also cloud agnostic, and gives user flexibility.

Kubernetes Is Scary But Why Kubernetes?

Kubernetes itself scares people away, even many DevOps and software engineers don’t know Kubernetes, but to build a scalable and modern platform in 2024 you cannot escape Kubernetes. Otherwise you use cloud services which gives you vendor lock-in, and no flexibility, and higher cost after a certain point. There is no way to migrate your workload from one cloud to another.

Kubernetes is excellent for stateless applications, scheduled applications and even for DAGs, all that we need for building data pipelines and machine learning applications. Even stateful works well in Kubernetes nowadays.

Is Kubeflow An Abandonware?

Kubeflow is Vertex Ai in GCP, which is the main MLOps platform in GCP (Google Cloud Platform).

Tools In Kubeflow

In the early days Kubeflow came with many tools and since it was only a start, many of the tools were not very well integrated. Kubeflow also has to go through the evolution phase where a lot of things needed to change. With time new tools got introduced, and older ones were retracted. Its good to see that Kubeflow has now a better picture of what the platform needs to become. Kubeflow is now focused on fewer tools with better integration of core components, and it has been working well.

Improvements In Kubeflow

Kubernetes can be wild, because so many tools can be installed on Kubernetes nowadays, but managing them might be difficult and integrating them is difficult as well.

Kubeflow took this into consideration and made the Kubeflow Dashboard flexible for integrating different tools, which make things easier for the data scientists, so data scientists can focus on building machine learning models and analytics work.

Kubeflow pipelines have an improved user interface, multi-user isolation, sdk and compiler and also pipeline caching for faster pipeline execution.

For Ci/Cd, Tekton and ArgoCd can be used, ArgoCd also enables GitOps.

KFServing is now KServe. Kserve now also uses GRPC protocol, and can be extended for explainability. It comes with some explainers. You can build a custom explainer as well. KServe now has better support for canary deployments as well, making it easier to roll out updates and test new models.

Kubernetes is not the only difficult part, Istio itself is also difficult, Kubeflow took this into consideration as well and reduced the difficulty in getting started with Istio. Also, Istio itself as evolved and for authorization envoyfilters is being discouraged to use.

Kubeflow notebooks have added support for VS Code as well, so now VS Code and Jupyter Notebook both are supported in Kubeflow.

Prometheus and Grafana have been improved for better observability of Kubeflow components.

With the introduction of kfp-kubernetes python library, users can author Kubeflow Pipelines with Kubernetes specific features.

Katib now has improved support for early stopping rules, enabling efficient use of computational resources.

RBAC allowed improved integration with Kubernetes Role-Based Access Control (RBAC) for more secure and flexible user management, and also fine-grained access control over different Kubeflow components and resources.

Data Management

If you are using Kubernetes then its not difficult to set up your own open data lakehouse (https://cloud.google.com/discover/what-is-a-data-lakehouse?hl=en) as well, allowing us to manage data in a DataOps fashion.

We can use other tools like Pachyderm and DVC for data versioning as well.

MLOps with DataOps gives you everything you need for a machine learning infrastructure.

Documentation

Unfortunately, due to plethora of changes, the documentation is outdated and if got stuck, it can often eat up a lot of time. The documentation needs to improve.

Conclusion

Kubeflow has made an excellent progress, Istio is easier than before, tools are integrated better. KServe is more advanced now. RBAC and Katib has also improved a lot. Kubeflow dashboard can make things easier for data scientists and other team members. Combining MLOps and DataOps can give you a powerful platform for managing machine learning lifecycle (MLOps).

There is no better way of doing machine learning on Kubernetes than Kubeflow, otherwise you would be building your own Kubeflow on Kubernetes, and re-creating the wheel which does not make any sense. If Kubeflow was closed source then it would make sense in building your own machine learning platform but it isn’t and its highly flexible too.

At the end, I would like to introduce you to the course I created that teaches you MLOps with Kubeflow and basics of DataOps with open source tools. The course is Hands On MLOps With Kubeflow, find it here.

You can start learning MLOps with Kubeflow and DataOps right from your local machine. It will take at least 2 months to finish this course. It will teach you basics of Kubernetes, machine learning, hands on example on DataOps and Kubeflow. In this course I have used Kubeflow 1.9.rc.2 version and all the latest tools, so you don’t get caught in outdated documentation either, saving you months or even years of effort.

Thank you

--

--

Wajeeh Ul Hassan
Wajeeh Ul Hassan

Written by Wajeeh Ul Hassan

#MLOps, Machine Learning Engineer, former Full Stack Engineer

No responses yet