Productivization of AI: rise of MLOps

10 min readFeb 10, 2021

ML and deep learning used to be stuck in prototyping phase as technology was evolving and AI talent was scarce. As AI programs in universities around the world are becoming main attraction for students, ML engineers are becoming part of every tech company personnel and software tools for AI are becoming strategic products of cloud giants we are seeing transition where ML and deep learning is becoming standard part of products and internal tools that can provide competitive edge for companies in many industries. But to be fair AI is also deep into hype cycle where we see many companies falsely claim they are AI driven.

This dynamic created demand for new discipline: MLOps. As companies race to deliver latest state of the art NLP and computer vision models they are slowly realizing productivization of ML and deep learning models presents new challenges. We are seeing ever increasing demand for devops engineers since they provide direct business value and coincide with agile methodology. Devops can enable smother movement of software from development to production, better observability and increased availability, but let’s not forget devops is about culture and tolls are secondary. MLOps is devops sibling, where we add ML and deep learning to already complex world of devops.

Hardware

Devops engineers are mainly concerned with cost minimization, but choice of hardware will play deciding role in performance of deep learning algorithms and cost minimization will be even more noticeable. Comparing AI specific hardware is hard since final performance will not only depend on hardware used, but also on software tools.

Cloud

NVIDIA is biggest player in GPU market fueled by gaming, AI and crypto. They also invest in development and target AI market with special chips:

T4 is cost efficient choice for prediction which can be cheaper for models that are not among the largest ones
V100 was powerhouse used for training deep learning models and deploying models that just won’t fit on T4
A100 is newest member of NVIDIA AI product line with significant improvements such as GPU virtualization which can bring serverless computing to GPU world and improvements in quantization which is technique for lowering prediction cost of the model by changing precision of model parameters(for example from 64 bit floating point to 8 bit integer and minimizing loss of prediction precision with fake quantization nodes when training models and other methods)

There are also other companies which assist users in getting the most out of their NVIDIA GPUs such as Lambda labs. We have also seen emergence of workstations and industry grade server racks specific for AI such as DGX which can be well utilized by deep learning research labs. NVIDIA is also supporting cloud space through NGC where we can find container images for GPU workloads, from minimal images with only base software up to images with certain models or NVIDIA products.

We can also observe Google joining the race early with proprietary TPU technology which is designed specifically for AI workloads. They seem to be doing excellent job with TPU development, but they are not the only cloud provider offering in-house AI chips since AWS is working hard in making Inferentia best AI chip for prediction in the market. Since Inferentia chips are not supporting CUDA drivers and “classical deep learning software stack” but rather replacement called Neron SDK they do not yet support all models which can run on NVIDIA GPUs. Google also offers some of their GPU compute power for free through Google Colab. Paperspace is trying to offer GPU as a service by already setting up development environment. There is also interesting attempt of vast.ai by offering GPU renting.

We are also seeing other technology such as FPGA chips by Xilinx making an entrance, but they have hard time getting any significat traction.

Moving to the edge

Edge computing focused on deep learning is becoming as important as cloud since it can lower latency, enable compliance with regulations and brings us additional opportunities. In the age of DIY and open source software we are also seeing edge computing as some sort of tech equivalent to 3D printing since everyone with a bit of coding skills can create something useful. Edge computing is also critical enabler for self driving cars, robots and drones.

As with GPUs NVIDIA is leading the charge with Jetson product line. Apart from industry ready products such as Xavier they also offer cost efficient solutions for tech enthusiasts like Nano which can be also used for DIY deep learning solutions or solving real business problems with less compute intensive models.

Google entered this market with Coral Edge TPU which can be competitor to NVIDIA Jetson line. There is also Intel with their neural stick which has issues competing against NVIDIA jetson line and Coral TPU.

Other cloud giants like AWS and Azure are not attacking edge computing market by trying to outperform NVIDIA and Google, but have rather invested their efforts in making integration between cloud and edge seamless with AWS Greengrass and Azure IoT edge. Although AWS offers DeepLense product, they seemed to rather partnered with NVIDIA in their next similar venture which is AWS Panorama, adding user friendly SDK on top of NVIDIA Jetson.

GPU benchmarking was always a challenge since performance does not depend only on hardware used, but also on type of software running deep learning models. As I mentioned with AWS Inferentia drivers and software can also pose limitation to which models can be used, but the team is working hand in hand with community to remove as much limitations as possible. We are also observing emergence of systemized benchmarks to measure performance of deep learning hardware with primary competitor being MLPerf.

We can also see some companies like Tesla developing their own deep learning hardware and I’m eager to see how this will play out. I think this can lower costs and increase performance in the long term, but can be huge risk in short to mid term(2–5 year) as developing hardware and hardware specific software can be extremely challenging and we have incumbents like NVIDIA with years of experience and teams working full time on edge computing products.

Software

Deep learning is ideally run on GPUs or some other specialized hardware in order to make matrix operations parallel. NVIDIA is famous for developing CUDA which enables software to communicate with GPU and CUDnn which extends CUDA in order to run deep learning. Big tech companies have developed their own deep learning software:

Google was early mover with their Tensorflow and later developed Tensorlfow lite for optimizing models when making predictiong, making it suitable for deployment in resource constraint environments like edge and mobile. They also developed JAX, which also has some specifics with regard to bundling CUDA and CUDnn. In order to rival Pytorch(bellow) they also developed Tensorflow eager. Google also helped with pushing models to production with Tensorflow serving, used to transform Tensorflow model in REST API. Community’s critiques of Tensorflow is their API design, but UX was improved in v2.0 when Google wanted to improve Tensorflow experience by making it more like Keras.
Facebook developed Pytorch which first dominated academic sphere and is moving in production systems(see page 13 and 14). They also made moving models to production easier with Torch serve(exposing models as REST API)in cooperation with AWS. Pytorch lightning can significantly improve experience, speed and quality of developing and deploying models.
Microsoft(along with Intel, Baidu and few others) is supporting MXNet and ONNX which promises to port models between deep learning frameworks without loss of performance.
There are also some universities which developed their own software libraries like Berkeley(Caffe) and Montreal(Theano).
NVIDIA developed TensorRT(along with complementary Graph Surgeon) which optimizes models through quantization and improves their performance(consequently lowering costs) on their GPUs.
AWS with above mentioned Neuron SDK

As mentioned with Tensorflow serving and Torch serving software is not designed with main objection to train neural networks anymore, but also to use those models in production or to expose them to colleagues in other teams.

Developer-led landscape is worth around 40 billion dollars, but with advent of Software 2.0 we can be sure to see comparable MLOps-led landscape in this decade. It is worth mentioning that since deep learning produces more data it is in close connection with big data tools like Snowflake or Kafka and vice-versa since we see some big data tools using AI.

Open source plays important role in deep learning and is becoming criteria for accepting papers in the midst of reproduction crisis. Free tools like PapersWithCode empowers individual AI enthusiasts to build their own models which can be close to state of the art.

AI as a service

Models

Cloud giants are already offering models as a service, like AWS Rekognition or Azure Cognitive services. It is also worth mentioning that those companies employ best AI scientists and Google even acquired Deepmind, AI research company whose goal is AGI. Deepmind raised to its prominence with AlphaGo and has long history of exceeding human player performance with deep learning. With David Silver leading the charge they are extensively using reinforcement learning to solve interesting challenges for Google like optimize heating in data centers or minimizing battery usage of Android. They have issues with debt which they can’t solve even with increasing revenue, but is seems Google is well aware that they are long term investment. Having strong financial backing enables them to hire top talent and focus on fundamental research that can pay off in the future, but lets not forget that Alphabet already lost patience with similar company they acquired for their robotic future: Boston dynamics which later finally reached synergy through economics of scale with Hyundai. But since Deepmind requires expansive hardware for their fundamental research it’s massive usage of GCP makes it better fit than Boston Dynamics along with already solving some challenges for Google like data center heating or health services along with solving ground breaking protein folding problem.

OpenAI, backed by Microsoft among others is similar to Deepmind with regard to prioritization of fundamental research and high costs of high skilled researchers with some of them even making more than million dollars per year. In order to offset losses OpenAI opened subsidiary that became capped-profit company, which happened after creating state of the art NLP model called GPT-3 that has estimated 175-billion parameters with cost of training being upwards of 4.6 milion dollars. Due to its size GPT-3 is probably not able to run on consumer grade hardware. OpenAI will offer GPT-3 through payable REST API and this is the first time state of the art model won’t be open sourced but available as a service without similar open source alternative for now. This is creating similar effect as cloud: startups are building product rapidly since the didn’t have to spend time with training and researching NLP models just as cloud based company outsources its infrastructure work to cloud company which is premise on which AWS was build(access to world class infrastructure through every laptop instead of hiring lots of engineers and waiting for them to set everything up).

Infrastructure

We are not seeing only models as a service, but also tools for building custom models and data labeling as managed services. AWS Sagemaker is great example of service that makes developing models a bit easier and contains set of tools for data labeling(even cloud points for labeling 3D data).

On the other side we can see open source tools like Kubeflow(supported by Google and even offered as managed service), Cortex, BentoML,… . In my opinion companies sometimes require extremely low latency and some other specifics which requires custom platform, but as with some big data platforms it’s hard to satisfy all demand with managed services(SaaS) with SaaS still being used for fast iteration and quick MVP which is important for startups in AI.

Github stars statistics can be proxy for popularity or awareness. Number of new monthly project stars is on the left and comulative number of stars is on right side of picture. This is plot of 3 popular MLOps tools, with kfserving being minor product as it’s part of Kubeflow. We can see there is significant interest in MLOps tolling, especially if we take into account it’s relatively new. Code is available on my github.

We need data, lots of data

Data labeling

Data collection and labeling market is being valued about 8 billion dollars. This is not unexpected since labeled data is “fuel for deep learning”. It’s also interesting that there are some interesting technologies like Active learning which reduce human engagement in labeling process. As mentioned in “AI as a service” there are also SaaS and open source solutions for labeling tabular, visual and audio data.

It’s not necessary to label real data since synthetic data is becoming reality. In this interesting clash of AI and VR we can create synthetic data with VR tools like Unity which has several advantages since we can generate more data which is labeled by design and we can hit interesting angles that are hard to find in reality.

Zero-shot learning is interesting and relatively new approach which makes models more “data-efficient”.

Synthesis of visual and sound

Combining visual with audio is interested as OpenAI demonstrated with CLIP and DELL·E. What is more interesting is that computer vision advanced by porting originally NLP idea(transformers) in the form of DETR model developed by Facebook AI research(FAIR).

Versioning

Collaboration is important everywhere…in software engineering we have another component to collaborate on(code) and companies like Github(behemoth of open source software acquired by Microsoft) and Gitlab(one-stop shop for devops with 6 billion dollar valuation) are transforming how software is developed with help of increased interest in devops(gitops) practices. Gitlab is interesting since it started as regular open source software and later transformed to open-core model where users pay only for additional(non-core) features, one of many emerging pricing models for open source software. They are also embracing new trends with being fully remote company and transparency even for business KPIs.

ML and deep learning adds some complexities since in addition to code we have to version data and models. Interesting tool here is MLFlow which is open sourced by tech company Databricks worth almost 30 billion dollars, but there are new entrants like Weights and Biases that raised 15 million dollars.

Versioning of code opened some interesting new opportunities like infrastructure-as-code which presented some interesting companies like Hashicorp valued upwards of 5 billion dollars. These technologies enable software to be build cheaper, faster and with higher quality. AI is opening new frontier in model and data versioning which is becoming as important as code versioning and is enabling some interesting new processes like experiment orchestration(Polyaxon).

Regulations

Personal data protection is one of the main points when we talk about any kind of data processing and since a lot has been said about this I won’t focus on that. What I’d like to mention is explainability, which is important from regulatory perspective and to understand deep learning models a bit better, beyond just hyper parameter and architecture tuning, relying on huge computing power of cloud giants. There is growing number of academic literature and companies tackling issue of explainability.