Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 32302
MLOps Scaling Machine Learning Lifecycle in an Industrial Setting

Authors: Yizhen Zhao, Adam S. Z. Belloum, Gonc¸alo Maia da Costa, Zhiming Zhao


Machine learning has evolved from an area of academic research to a real-world applied field. This change comes with challenges, gaps and differences exist between common practices in academic environments and the ones in production environments. Following continuous integration, development and delivery practices in software engineering, similar trends have happened in machine learning (ML) systems, called MLOps. In this paper we propose a framework that helps to streamline and introduce best practices that facilitate the ML lifecycle in an industrial setting. This framework can be used as a template that can be customized to implement various machine learning experiments. The proposed framework is modular and can be recomposed to be adapted to various use cases (e.g. data versioning, remote training on Cloud). The framework inherits practices from DevOps and introduces other practices that are unique to the machine learning system ( versioning). Our MLOps practices automate the entire machine learning lifecycle, bridge the gap between development and operation.

Keywords: Cloud computing, continuous development, data versioning, DevOps, industrial setting, MLOps, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 744


[1] Sculley, D. and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Francois and Dennison, Dan, Hidden Technical Debt in Machine Learning Systems. Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 2503–2511, Cambridge, MA, USA, MIT Press, 2015.
[2] Dashmote
[3] Yizhen Zhao, Machine Learning in Production: A Literature Review. Reports/2021-LiteratureStudy-report-Yizhen.pdf
[4] Adarsh Shah, Challenges Deploying Machine Learning Models to Production. challenges-deploying-machine-learning-models-to-production-ded3f9009cb3
[5] Luigi, 5 Challenges to Running Machine Learning Systems in Production. 5-challenges-to-ml-in-production-solve-them-with-aws-sagemaker/
[6] Paleyes, Andrei and Urma, Raoul-Gabriel and Lawrence, Neil D. Challenges in Deploying Machine Learning: a Survey of Case Studies. arXiv e-prints, page arXiv:2011.09926, 2020.
[7] Git
[8] Anant Bhardwaj and Souvik Bhattacherjee and Amit Chavan and Amol Deshpande and Aaron J. Elmore and Samuel Madden and Aditya G. Parameswaran DataHub: Collaborative Data Science & Dataset Version Management at Scale, 2014.
[9] Datahub
[10] Vimarsh Karbhari, MLOps: Data Science Version Control. ml-ops-data-science-version-control-5935c49d1b76
[11] Pachyderm
[12] AWS Sagemaker Ground Truth groundtruth/?nc1=h\ ls
[13] AWS Sagemakerweb
[14] Azure
[15] Azure, Deploy machine learning models to Azure. com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli
[16] MLflow
[17] MLflow sagemaker\ api/ mlflow.sagemaker.html\#module-mlflow.sagemaker
[18] Kyle Gallatin, Deploying Models to Production with Mlflow and Amazon Sagemaker. mlflow-and-amazon-sagemaker-d21f67909198
[19] Emmanuel Raj, Edge MLOps framework for AIoT applications, Continuous delivery for AIoT, Big Data and 5G applications, 2020.
[20] Azure Machine Learning machine-learning/
[21] DevOps
[22] P¨ol¨oskei, Istv´an, MLOps approach in the cloud-native data pipeline design. Acta Technica Jaurinensis, 2020.
[23] Yizhen Zhao, MLOps Scale ML in an Industrial Setting. https://staff.\ Yizhen\ Zhao.pdf
[24] Yizhen Zhao, MLOps and data versioning in machine learning project. 2020-Internship\ report-Yizhen.pdf
[25] Yizhen Zhao, MLOps: Data versioning with DVC — Part I.\ %E2\%85\%B0-8b3221df8592
[26] Ubereats
[27] DVC
[28] Jenkins
[29] DVC File\ #dvc-files
[30] Airflow
[31] Git Flow
[32] DVC YAML File pipelines-files
[33] DVC LOCK File pipelines-files\#dvclock-file
[34] Sagemaker Batch Transform latest/dg/batch-transform.html
[35] Yizhen Zhao, MLOps: Deploy custom model with AWS Sagemaker batch transform — Part II. mlops-deploy-custom-model-with-aws-sagemaker-batch-transform-part-\ %E2\%85\%B1-54263ec711ce
[36] Sagemaker Price
[37] MLflow, How Runs and Artifacts are RecordedHow Runs and Artifacts are Recorded.\ #how-runs-and-artifacts-are-recorded
[38] AWS EC2 Target Group elasticloadbalancing/latest/application/load-balancer-target-groups.html
[39] AWS, Train a Model with Amazon SageMaker. com/sagemaker/latest/dg/how-it-works-training.html
[40] Azure, Machine Learning Operations maturity model. mlops/mlops-maturity-model