MLOps Scaling Machine Learning Lifecycle in an Industrial Setting
Authors: Yizhen Zhao, Adam S. Z. Belloum, Gonc¸alo Maia da Costa, Zhiming Zhao
Machine learning has evolved from an area of academic research to a real-world applied field. This change comes with challenges, gaps and differences exist between common practices in academic environments and the ones in production environments. Following continuous integration, development and delivery practices in software engineering, similar trends have happened in machine learning (ML) systems, called MLOps. In this paper we propose a framework that helps to streamline and introduce best practices that facilitate the ML lifecycle in an industrial setting. This framework can be used as a template that can be customized to implement various machine learning experiments. The proposed framework is modular and can be recomposed to be adapted to various use cases (e.g. data versioning, remote training on Cloud). The framework inherits practices from DevOps and introduces other practices that are unique to the machine learning system (e.g.data versioning). Our MLOps practices automate the entire machine learning lifecycle, bridge the gap between development and operation.
Keywords: Cloud computing, continuous development, data versioning, DevOps, industrial setting, MLOps, machine learning.Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 744
 Sculley, D. and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Francois and Dennison, Dan, Hidden Technical Debt in Machine Learning Systems. Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 2503–2511, Cambridge, MA, USA, MIT Press, 2015.
 Dashmote https://dashmote.com/
 Yizhen Zhao, Machine Learning in Production: A Literature Review. https://staff.fnwi.uva.nl/a.s.z.belloum/LiteratureStudies/ Reports/2021-LiteratureStudy-report-Yizhen.pdf
 Adarsh Shah, Challenges Deploying Machine Learning Models to Production. https://towardsdatascience.com/ challenges-deploying-machine-learning-models-to-production-ded3f9009cb3
 Luigi, 5 Challenges to Running Machine Learning Systems in Production. https://mlinproduction.com/ 5-challenges-to-ml-in-production-solve-them-with-aws-sagemaker/
 Paleyes, Andrei and Urma, Raoul-Gabriel and Lawrence, Neil D. Challenges in Deploying Machine Learning: a Survey of Case Studies. arXiv e-prints, page arXiv:2011.09926, 2020.
 Git https://git-scm.com
 Anant Bhardwaj and Souvik Bhattacherjee and Amit Chavan and Amol Deshpande and Aaron J. Elmore and Samuel Madden and Aditya G. Parameswaran DataHub: Collaborative Data Science & Dataset Version Management at Scale, 2014.
 Datahub https://datahub.io/
 Vimarsh Karbhari, MLOps: Data Science Version Control. https://medium.com/acing-ai/ ml-ops-data-science-version-control-5935c49d1b76
 Pachyderm https://www.pachyderm.com/
 AWS Sagemaker Ground Truth https://aws.amazon.com/sagemaker/ groundtruth/?nc1=h\ ls
 AWS Sagemakerweb https://aws.amazon.com/sagemaker/
 Azure https://azure.microsoft.com/en-us/
 Azure, Deploy machine learning models to Azure. https://docs.microsoft. com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli
 MLflow https://mlflow.org
 MLflow sagemaker https://www.mlflow.org/docs/latest/python\ api/ mlflow.sagemaker.html\#module-mlflow.sagemaker
 Kyle Gallatin, Deploying Models to Production with Mlflow and Amazon Sagemaker. https://towardsdatascience.com/deploying-models-to-productionwith- mlflow-and-amazon-sagemaker-d21f67909198
 Emmanuel Raj, Edge MLOps framework for AIoT applications, Continuous delivery for AIoT, Big Data and 5G applications, 2020.
 Azure Machine Learning https://azure.microsoft.com/en-us/services/ machine-learning/
 DevOps https://azure.microsoft.com/en-us/services/devops/
 P¨ol¨oskei, Istv´an, MLOps approach in the cloud-native data pipeline design. Acta Technica Jaurinensis, 2020.
 Yizhen Zhao, MLOps Scale ML in an Industrial Setting. https://staff. fnwi.uva.nl/a.s.z.belloum/MSctheses/MScthesis\ Yizhen\ Zhao.pdf
 Yizhen Zhao, MLOps and data versioning in machine learning project. https://staff.fnwi.uva.nl/a.s.z.belloum/LiteratureStudies/Reports/ 2020-Internship\ report-Yizhen.pdf
 Yizhen Zhao, MLOps: Data versioning with DVC — Part I. https://yizhenzhao.medium.com/mlops-data-versioning-with-dvc-part-\ %E2\%85\%B0-8b3221df8592
 Ubereats https://www.ubereats.com/nl-en
 DVC https://dvc.org/
 Jenkins https://www.jenkins.io/
 DVC File https://dvc.org/doc/user-guide/project-structure/dvc-files\ #dvc-files
 Airflow https://airflow.apache.org/
 Git Flow https://guides.github.com/introduction/flow/
 DVC YAML File https://dvc.org/doc/user-guide/project-structure/ pipelines-files
 DVC LOCK File https://dvc.org/doc/user-guide/project-structure/ pipelines-files\#dvclock-file
 Sagemaker Batch Transform https://docs.aws.amazon.com/sagemaker/ latest/dg/batch-transform.html
 Yizhen Zhao, MLOps: Deploy custom model with AWS Sagemaker batch transform — Part II. https://yizhenzhao.medium.com/ mlops-deploy-custom-model-with-aws-sagemaker-batch-transform-part-\ %E2\%85\%B1-54263ec711ce
 Sagemaker Price https://aws.amazon.com/sagemaker/pricing/
 MLflow, How Runs and Artifacts are RecordedHow Runs and Artifacts are Recorded. https://mlflow.org/docs/latest/tracking.html\ #how-runs-and-artifacts-are-recorded
 AWS EC2 Target Group https://docs.aws.amazon.com/ elasticloadbalancing/latest/application/load-balancer-target-groups.html
 AWS, Train a Model with Amazon SageMaker. https://docs.aws.amazon. com/sagemaker/latest/dg/how-it-works-training.html
 Azure, Machine Learning Operations maturity model. https://docs.microsoft.com/en-us/azure/architecture/example-scenario/ mlops/mlops-maturity-model