Blogs

August 7, 2023

Azure MLOps Challenge Blog: Part 1

Ben Roberts

If you are interested in learning how to apply DevOps principles and practices to machine learning, you might want to check out the Azure MLOps Challenge. This challenge is a hands-on learning experience that guides you through the creation and deployment of a machine learning model using Azure Machine Learning and GitHub Actions. In this blog post, I will give you an overview of the main objectives of each challenge and the solution I created to meet the success criteria.

Unlike a regular lab that guides you through each step, a challenge lab only provides you some vague objectives and hints. You have to rely on your own abilities and ingenuity (and Github Copit 🤣) to solve the problem. This allows you to create your own unique solution and feel the thrill of discovery and the reward of creating something new.

Join me for a series of blogs where I document my solution to each challenge, while sprinkling in a little Azure magic to take each module solution to the next level and showcase some Azure and GitHub best practice. My focus will be on the Ops side of MLOps so I won’t be concentrating on the training/evaluation of models and related Python code or other topics that would be considered the realm of Data Scientists.

What is Azure MLOps?

Before we dive into the details, let’s first define what MLOps is and why it matters. MLOps stands for Machine Learning Operations, and it is a set of practices that aim to improve the quality, reliability, and efficiency of machine learning workflows. MLOps borrows concepts and tools from DevOps, such as version control, continuous integration, continuous delivery, testing, monitoring, feedback loops, and adapts them to the specific needs and challenges of machine learning.

Azure MLOps is a solution that enables you to implement MLOps on the Azure cloud platform. It leverages Azure Machine Learning, a fully managed service that provides end-to-end capabilities for building, training, deploying, and managing machine learning models. It also integrates with GitHub Actions, a workflow automation tool that allows you to create custom pipelines for your machine learning projects. With Azure MLOps, you can streamline your machine learning lifecycle, from data preparation to model deployment and monitoring, while ensuring reproducibility, traceability, and collaboration.

The Challenge

The challenge consists of seven modules that cover different aspects of the machine learning lifecycle:

Module 0: Convert a notebook to production code
Module 1: Create an Azure Machine Learning job
Module 2: Trigger the Azure Machine Learning job with GitHub Actions
Module 3: Trigger GitHub Actions with feature-based development
Module 4: Work with linting and unit testing
Module 5: Work with environments
Module 6: Deploy and test the model

I’ll be adding an eight module to produce a functioning web app deployed to Azure App Service that will consume the model via an Azure ML Endpoint. This will be a great opportunity to showcase the power of Azure Machine Learning, GitHub Actions and Azure:

Module 7: Deploy a functioning App

In each module, you will work with several technologies that are essential to implementing MLOps. Here is a brief description of each resource and its role in the challenge:

Workspace: A workspace is the top-level resource for Azure Machine Learning. It is a cloud-based environment that contains all the assets and resources related to your machine learning projects, such as datasets, experiments, models, endpoints, compute targets, etc. You can create and manage workspaces using the Azure portal, the Azure Machine Learning studio, or the Azure Machine Learning CLI. A workspace also provides access to other Azure services, such as Azure Key Vault, Azure Storage, Azure Monitoring/Diagnostics, etc., that enable you to secure, store, monitor, and manage your machine learning assets. In Module 1, you will create a workspace for your challenge project.

Dataset: A dataset is an abstraction that represents a data source in Azure Machine Learning. It can be created from various sources, such as local files, cloud storage, web URLs, SQL databases, etc. A dataset can be registered in a workspace for easy access and reuse across different experiments and pipelines. You can also use datasets to track data lineage and provenance.

Model: A model is a file or a folder that contains the code and/or data that defines a machine learning model in Azure Machine Learning. A model can be registered in a workspace for versioning and management. You can also use models to create deployments that expose your model as a web service or an IoT module running as a Container

Endpoint: An endpoint is a resource that represents a deployed model in Azure Machine Learning. An endpoint consists of two components: an inference configuration and a deployment configuration. The inference configuration defines how to run the model code and what dependencies are needed. The deployment configuration defines where and how to deploy the model, such as the compute target (e.g., Azure Container Instance, Azure Kubernetes Service), the scaling options (e.g., number of instances), the authentication options (e.g., key or token), etc. You can create and manage endpoints using the Azure Machine Learning studio or the CLI.

Pipeline: An Azure Machine Learning pipeline (not to be confused with an Azure DevOps pipeline) is a workflow that defines a sequence of steps to perform a complex machine learning task in Azure Machine Learning. Each step can run a script or a module on a specified compute target and produce outputs that can be consumed by subsequent steps. A pipeline can be published in a workspace for reuse and scheduling. You can also use pipelines to create GitHub Actions workflows that trigger your pipelines based on events such as code commits or pull requests. In Module 6, you will create a pipeline that automates the model training and deployment process. A pipeline has several advantages over running scripts or notebooks manually, such as:

Reproducibility: A pipeline ensures that each step is executed with the same configuration and inputs, regardless of who runs it or when it runs. This eliminates the risk of human errors or inconsistencies that might affect the results.
Modularity: A pipeline allows you to break down a complex task into smaller and simpler steps that can be reused and combined in different ways. This makes it easier to maintain, debug, and update your code
Scalability: A pipeline enables you to run each step on a different compute target that suits its requirements, such as CPU, GPU, memory, etc. This allows you to optimize the performance and cost of your machine learning workflow.
Parallelism: A pipeline lets you run multiple steps in parallel if they do not depend on each other’s outputs. This can speed up the execution time and improve the efficiency of your machine learning workflow.
Portability: A pipeline can be exported as a JSON file that contains all the information needed to run it on another workspace or environment. This makes it easy to share and collaborate on your machine learning projects.

Preface

The tone of this and future blogs in the series will be highly technical and all configuration will be complete via code. I also don’t claim that this the best or only way to complete the challenge.

To following along I recommend you have a basic understanding of the following:

Core Azure resources like subscriptions, resource groups, and role-based access control
Azure identity resources like Service Principals and Managed Identities
Azure security solutions like Key Vault and App Configuration
Version control and the CI/CD workflow
Machine Learning fundamentals (how models are trained, deployed and consumed) and Azure Machine Learning Studio
Python and Azure SDK’s basics like authentication and initializing clients.

Challenge 0: Convert a notebook to production code

Now we’ve got to preamble out of the way, let’s get to work. 🚀

The Objectives of the first challenge are as follows:

Clean nonessential code.
Convert your code to Python scripts.
Use functions in your scripts.
Use parameters in your scripts.

We’ll start by cloning the repo locally.

git clone https://github.com/MicrosoftLearning/mslearn-mlops.git

You can review the Jupyter notebook in the experimentation folder. The notebook is well commented and should be easy to follow. The notebook trains a basic regression model to predict diabetes based on several features. The notebook is based on the following tutorial: https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-train-models-with-aml

Clean nonessential code

As per the instructions we’ll start by adding the split_data function to train.py, you can use the train_test_split function from the sklearn.model_selection library. You can find a more examples in the following tutorial: https://learn.microsoft.com/en-gb/azure/machine-learning/how-to-convert-ml-experiment-to-production

Add this function underneath the comment “# TO DO: add function to split data”. Make sure to also import the train_test_split library at the top of your script.

from sklearn.model_selection import train_test_split

def split_data(df, test_size=0.2): 
    X = df.drop("Diabetic", axis=1) 
    y = df["Diabetic"] 
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size) 
    return X_train, X_test, y_train, y_test

Next, we’ll enable autologging with MLflow, we’ll install the mlflow library. Then, you can call mlflow.sklearn.autolog() before training the model to automatically log parameters, metrics, and artifacts for the scikit-learn model. Models are automatically logged when the fit() method is called on the pipeline object.

The complete code should look similar to this:

# Import libraries 
import argparse 
import glob 
import os 
import pandas as pd 

from sklearn.linear_model import LogisticRegression 
from sklearn.model_selection import train_test_split 
from mlflow.sklearn import autolog 

# define functions 
def main(args): 

    # enable autologging
    autolog() 

    # read data 
    df = get_csvs_df(args.training_data) 

    # split data 
    X_train, X_test, y_train, y_test = split_data(df) 

    # train model 
    train_model(args.reg_rate, X_train, X_test, y_train, y_test) 

def get_csvs_df(path): 
    if not os.path.exists(path): 
        raise RuntimeError(f"Cannot use non-existent path provided: {path}") 
    csv_files = glob.glob(f"{path}/*.csv") 
    if not csv_files: 
        raise RuntimeError(f"No CSV files found in provided data path: {path}") 
    return pd.concat((pd.read_csv(f) for f in csv_files), sort=False) 

# our new function
def split_data(df, test_size=0.2): 
    X = df.drop("Diabetic", axis=1) 
    y = df["Diabetic"] 
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size) 
    return X_train, X_test, y_train, y_test 

def train_model(reg_rate, X_train, X_test, y_train, y_test): 
    # train model 
    LogisticRegression(C=1 / reg_rate, solver="liblinear").fit(X_train, y_train) 

def parse_args(): 
    # setup arg parser 
    parser = argparse.ArgumentParser() 

    # add arguments 
    parser.add_argument("--training_data", dest="training_data", type=str) 
    parser.add_argument("--reg_rate", dest="reg_rate", type=float, default=0.01) 

    # parse args 
    args = parser.parse_args() 

    # return args 
    return args 
 
# run script 
if __name__ == "__main__": 
    # add space in logs 
    print("\n\n") 
    print("*" * 60) 

    # parse args 
    args = parse_args() 

    # run main function 
    main(args) 


    # add space in logs 
    print("*" * 60) 
    print("\n\n")

mlflow is already included in requirements.txt however, the version is very out of date and inlcudes known critical vulnerabilities. These are the versions I used:

pytest==7.4.0 
mlflow==2.6.0 
pandas==2.0.3 
scikit-learn==1.3.0

The completed train.py script performs the following tasks:

1. It reads in training data from a specified directory containing CSV files using the get_csvs_df function.
2. It splits the data into training and testing sets using the split_data function.
3. It trains a logistic regression model using the train_model function with a specified regularization rate.
4. It enables auto logging with MLflow to automatically log parameters, metrics, and artifacts for the scikit-learn model.
5. The script takes in two command-line arguments: –training_data, which specifies the path to the directory containing the training data, and –reg_rate, which specifies the regularization rate to use when training the logistic regression model.

Conclusion

In this blog post, I have given you a brief introduction to the main objectives that you will encounter in the Azure MLOps Challenge, an overview of Azure Machine Learning and the solution to Challenge 0. I hope this helps you to get started with the challenge and to learn more about how to apply MLOps principles and practices to your machine learning projects using Azure Machine Learning and GitHub Actions.

Stay tuned for part two of this blog series we’ll tackle Challange 1 and build an Azure Machine Learning job in YAML and run an Azure Machine Learning job with the Azure Machine Learning CLIv2

If you are interested in trying the challenge yourself, you can find more information and instructions on the challenge website: https://microsoftlearning.github.io/mslearn-mlops/

Happy learning and good luck! 🚀

Other Recent Blogs

Microsoft Teams IP Phones and Intune Enrollment

Saudh Mohammed March 23, 2024

Microsoft Teams provides a growing portfolio of devices that can be used as desk and conference room phones. These IP phones run on Android 8.x or 9.x and are required to be enrolled in Intune. By default, these devices are enrolled as personal devices, which is not ideal as users should not be able to enrol their own personal Android devices.

Microsoft 365 Copilot launched with next generation AI

How Microsoft 365 Copilot helped me improve my writing

David Ashworth March 5, 2024

A personal review of the new AI-powered assistant Microsoft 365 Copilot

Github Workflow — Feature branch changes— How to trigger a job based on changes to specific path

Ismael Oltra February 27, 2024

In a github workflow, how can you conditionally trigger a job based on github changes to files in specific folder?. All when using a feature branch as your active branch

Unveiling .NET 8 – Part 2: Seamless Azure Functions Integration

Dominic Codespoti February 15, 2024

Introduction In the second and final installment of this two-part series, we’ll be changing aim from the features made available by .NET 8, and focusing

Logic App Standard – Local Development and Managed Identities in 2024

Chris Bradshaw January 18, 2024

Anyone who has picked up and started using Azure’s Logic App Standard offering in the last couple of years will have seen the advancements made

1300 907 809
info@arinco.com.au

Level 9, 360 Collins Street,
Melbourne VIC 3000

Level 2, 24 Campbell St,
Sydney NSW 2000

200 Adelaide St,
Brisbane QLD 4000

191 St Georges Terrace
Perth WA 6000

Level 10, 41 Shortland Street
Auckland

Part of

Arinco trades as Arinco (VIC) Pty Ltd and Arinco (NSW) Pty Ltd. © 2023 All Rights Reserved Arinco™ | Privacy Policy | Sustainability and Our Community
Arinco acknowledges the Traditional Owners of the land on which our offices are situated, and pay our respects to their Elders past, present and emerging.