MLOps Scaling Machine Learning Lifecycle in an Industrial Setting
Authors: Yizhen Zhao, Adam S. Z. Belloum, Gonc¸alo Maia da Costa, Zhiming Zhao
Abstract:
Machine learning has evolved from an area of academic research to a real-world applied field. This change comes with challenges, gaps and differences exist between common practices in academic environments and the ones in production environments. Following continuous integration, development and delivery practices in software engineering, similar trends have happened in machine learning (ML) systems, called MLOps. In this paper we propose a framework that helps to streamline and introduce best practices that facilitate the ML lifecycle in an industrial setting. This framework can be used as a template that can be customized to implement various machine learning experiments. The proposed framework is modular and can be recomposed to be adapted to various use cases (e.g. data versioning, remote training on Cloud). The framework inherits practices from DevOps and introduces other practices that are unique to the machine learning system (e.g.data versioning). Our MLOps practices automate the entire machine learning lifecycle, bridge the gap between development and operation.
Keywords: Cloud computing, continuous development, data versioning, DevOps, industrial setting, MLOps, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1090References:
[1] Sculley, D. and Holt, Gary and Golovin, Daniel and Davydov, Eugene and Phillips, Todd and Ebner, Dietmar and Chaudhary, Vinay and Young, Michael and Crespo, Jean-Francois and Dennison, Dan, Hidden Technical Debt in Machine Learning Systems. Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 2503–2511, Cambridge, MA, USA, MIT Press, 2015.
[2] Dashmote https://dashmote.com/
[3] Yizhen Zhao, Machine Learning in Production: A Literature Review. https://staff.fnwi.uva.nl/a.s.z.belloum/LiteratureStudies/ Reports/2021-LiteratureStudy-report-Yizhen.pdf
[4] Adarsh Shah, Challenges Deploying Machine Learning Models to Production. https://towardsdatascience.com/ challenges-deploying-machine-learning-models-to-production-ded3f9009cb3
[5] Luigi, 5 Challenges to Running Machine Learning Systems in Production. https://mlinproduction.com/ 5-challenges-to-ml-in-production-solve-them-with-aws-sagemaker/
[6] Paleyes, Andrei and Urma, Raoul-Gabriel and Lawrence, Neil D. Challenges in Deploying Machine Learning: a Survey of Case Studies. arXiv e-prints, page arXiv:2011.09926, 2020.
[7] Git https://git-scm.com
[8] Anant Bhardwaj and Souvik Bhattacherjee and Amit Chavan and Amol Deshpande and Aaron J. Elmore and Samuel Madden and Aditya G. Parameswaran DataHub: Collaborative Data Science & Dataset Version Management at Scale, 2014.
[9] Datahub https://datahub.io/
[10] Vimarsh Karbhari, MLOps: Data Science Version Control. https://medium.com/acing-ai/ ml-ops-data-science-version-control-5935c49d1b76
[11] Pachyderm https://www.pachyderm.com/
[12] AWS Sagemaker Ground Truth https://aws.amazon.com/sagemaker/ groundtruth/?nc1=h\ ls
[13] AWS Sagemakerweb https://aws.amazon.com/sagemaker/
[14] Azure https://azure.microsoft.com/en-us/
[15] Azure, Deploy machine learning models to Azure. https://docs.microsoft. com/en-us/azure/machine-learning/how-to-deploy-and-where?tabs=azcli
[16] MLflow https://mlflow.org
[17] MLflow sagemaker https://www.mlflow.org/docs/latest/python\ api/ mlflow.sagemaker.html\#module-mlflow.sagemaker
[18] Kyle Gallatin, Deploying Models to Production with Mlflow and Amazon Sagemaker. https://towardsdatascience.com/deploying-models-to-productionwith- mlflow-and-amazon-sagemaker-d21f67909198
[19] Emmanuel Raj, Edge MLOps framework for AIoT applications, Continuous delivery for AIoT, Big Data and 5G applications, 2020.
[20] Azure Machine Learning https://azure.microsoft.com/en-us/services/ machine-learning/
[21] DevOps https://azure.microsoft.com/en-us/services/devops/
[22] P¨ol¨oskei, Istv´an, MLOps approach in the cloud-native data pipeline design. Acta Technica Jaurinensis, 2020.
[23] Yizhen Zhao, MLOps Scale ML in an Industrial Setting. https://staff. fnwi.uva.nl/a.s.z.belloum/MSctheses/MScthesis\ Yizhen\ Zhao.pdf
[24] Yizhen Zhao, MLOps and data versioning in machine learning project. https://staff.fnwi.uva.nl/a.s.z.belloum/LiteratureStudies/Reports/ 2020-Internship\ report-Yizhen.pdf
[25] Yizhen Zhao, MLOps: Data versioning with DVC — Part I. https://yizhenzhao.medium.com/mlops-data-versioning-with-dvc-part-\ %E2\%85\%B0-8b3221df8592
[26] Ubereats https://www.ubereats.com/nl-en
[27] DVC https://dvc.org/
[28] Jenkins https://www.jenkins.io/
[29] DVC File https://dvc.org/doc/user-guide/project-structure/dvc-files\ #dvc-files
[30] Airflow https://airflow.apache.org/
[31] Git Flow https://guides.github.com/introduction/flow/
[32] DVC YAML File https://dvc.org/doc/user-guide/project-structure/ pipelines-files
[33] DVC LOCK File https://dvc.org/doc/user-guide/project-structure/ pipelines-files\#dvclock-file
[34] Sagemaker Batch Transform https://docs.aws.amazon.com/sagemaker/ latest/dg/batch-transform.html
[35] Yizhen Zhao, MLOps: Deploy custom model with AWS Sagemaker batch transform — Part II. https://yizhenzhao.medium.com/ mlops-deploy-custom-model-with-aws-sagemaker-batch-transform-part-\ %E2\%85\%B1-54263ec711ce
[36] Sagemaker Price https://aws.amazon.com/sagemaker/pricing/
[37] MLflow, How Runs and Artifacts are RecordedHow Runs and Artifacts are Recorded. https://mlflow.org/docs/latest/tracking.html\ #how-runs-and-artifacts-are-recorded
[38] AWS EC2 Target Group https://docs.aws.amazon.com/ elasticloadbalancing/latest/application/load-balancer-target-groups.html
[39] AWS, Train a Model with Amazon SageMaker. https://docs.aws.amazon. com/sagemaker/latest/dg/how-it-works-training.html
[40] Azure, Machine Learning Operations maturity model. https://docs.microsoft.com/en-us/azure/architecture/example-scenario/ mlops/mlops-maturity-model