Skip to main content

Azure Machine Learning

Azure ML workspace, Designer, SDK v2, managed endpoints, platform comparison, and choosing a cloud provider

~45 min
Listen to this lesson

Azure Machine Learning

Azure Machine Learning is Microsoft's enterprise ML platform, deeply integrated with the Azure ecosystem and Microsoft's enterprise tools (Active Directory, Power BI, Azure DevOps). It is particularly strong for organizations already invested in the Microsoft stack.

Azure ML's Enterprise Strength

Azure ML is designed for enterprise governance and compliance. Its integration with Azure Active Directory, role-based access control (RBAC), private endpoints, and managed virtual networks makes it the platform of choice for highly regulated industries (healthcare, finance, government) that need strict data residency and access controls.

Azure ML Workspace

The workspace is the top-level resource — a centralized place to manage all ML artifacts:

  • Compute: Training clusters, compute instances, inference clusters
  • Data: Datasets, datastores, data assets
  • Models: Model registry with versioning
  • Environments: Reproducible Python/Docker environments
  • Pipelines: ML workflow definitions
  • Endpoints: Deployed model endpoints
  • Azure ML Designer

    A drag-and-drop visual interface for building ML pipelines — no code required:

  • Connect data preprocessing, feature engineering, and model training components
  • Real-time visualization of data flow
  • Useful for citizen data scientists and rapid prototyping
  • Can export pipelines to Python code
  • Azure ML SDK v2

    The latest Python SDK provides a clean, declarative API:

    python
    1# Azure ML SDK v2 — Training a model
    2from azure.ai.ml import MLClient, command, Input
    3from azure.ai.ml.entities import (
    4    Environment, AmlCompute, ManagedOnlineEndpoint,
    5    ManagedOnlineDeployment
    6)
    7from azure.identity import DefaultAzureCredential
    8
    9# --- Connect to workspace ---
    10ml_client = MLClient(
    11    credential=DefaultAzureCredential(),
    12    subscription_id="your-subscription-id",
    13    resource_group_name="your-rg",
    14    workspace_name="your-workspace",
    15)
    16
    17# --- Step 1: Create a compute cluster ---
    18compute = AmlCompute(
    19    name="gpu-cluster",
    20    type="amlcompute",
    21    size="Standard_NC6",         # GPU VM
    22    min_instances=0,             # Scale to zero
    23    max_instances=4,
    24    idle_time_before_scale_down=120,
    25)
    26ml_client.compute.begin_create_or_update(compute).result()
    27
    28# --- Step 2: Define the training environment ---
    29env = Environment(
    30    name="pytorch-training",
    31    image="mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04",
    32    conda_file="conda.yml",
    33)
    34
    35# --- Step 3: Define and submit the training job ---
    36job = command(
    37    code="./src",
    38    command="python train.py --epochs 50 --lr 0.001 --data ${{inputs.training_data}}",
    39    inputs={
    40        "training_data": Input(
    41            type="uri_folder",
    42            path="azureml://datastores/workspaceblobstore/paths/data/train"
    43        ),
    44    },
    45    environment=env,
    46    compute="gpu-cluster",
    47    display_name="fraud-detection-training",
    48    experiment_name="fraud-detection",
    49)
    50
    51returned_job = ml_client.jobs.create_or_update(job)
    52print(f"Job submitted: {returned_job.name}")
    53print(f"Studio URL: {returned_job.studio_url}")

    Managed Online Endpoints

    Azure ML provides managed endpoints for real-time inference with blue-green deployments and traffic splitting:

    python
    1# Deploy a model to a managed online endpoint
    2from azure.ai.ml.entities import (
    3    ManagedOnlineEndpoint,
    4    ManagedOnlineDeployment,
    5    Model,
    6    CodeConfiguration,
    7)
    8
    9# --- Create endpoint ---
    10endpoint = ManagedOnlineEndpoint(
    11    name="fraud-endpoint",
    12    auth_mode="key",
    13)
    14ml_client.online_endpoints.begin_create_or_update(endpoint).result()
    15
    16# --- Create deployment ---
    17blue_deployment = ManagedOnlineDeployment(
    18    name="blue",
    19    endpoint_name="fraud-endpoint",
    20    model=Model(path="./model"),
    21    code_configuration=CodeConfiguration(
    22        code="./score",
    23        scoring_script="score.py"
    24    ),
    25    instance_type="Standard_DS3_v2",
    26    instance_count=2,
    27)
    28ml_client.online_deployments.begin_create_or_update(
    29    blue_deployment
    30).result()
    31
    32# --- Route 100% traffic to blue deployment ---
    33endpoint.traffic = {"blue": 100}
    34ml_client.online_endpoints.begin_create_or_update(endpoint).result()
    35
    36# --- Test the endpoint ---
    37result = ml_client.online_endpoints.invoke(
    38    endpoint_name="fraud-endpoint",
    39    request_file="sample_request.json",
    40)
    41print(result)

    Platform Comparison: AWS vs GCP vs Azure

    FeatureAWS SageMakerGoogle Vertex AIAzure ML
    AutoMLAutopilotAutoML (best)Automated ML
    No-code UICanvasConsoleDesigner
    Foundation modelsJumpStartModel Garden (best)Model Catalog
    Custom trainingAny containerAny containerAny container
    PipelinesSageMaker PipelinesKFP-basedAzure ML Pipelines
    Feature StoreSageMaker FSVertex FS (best)Managed Feature Store
    MLOpsStrongStrongStrongest (Azure DevOps)
    EnterpriseGoodGoodBest (AD, RBAC, compliance)
    CostMost flexibleCompetitiveMost predictable
    EcosystemLargestBest AI/ML researchBest Microsoft integration

    Choosing a Cloud Provider

    Choose AWS SageMaker when:

  • You are already on AWS
  • You need the broadest selection of ML instance types
  • You want the most mature and flexible platform
  • Cost optimization is critical (spot instances, savings plans)
  • Choose Google Vertex AI when:

  • You need best-in-class AutoML
  • You want to use Google's foundation models (Gemini)
  • Your team uses TensorFlow extensively
  • You need strong big-data integration (BigQuery, Dataflow)
  • Choose Azure ML when:

  • Your organization is invested in the Microsoft ecosystem
  • You need the strongest enterprise governance and compliance
  • Your team uses Azure DevOps for CI/CD
  • You are in a regulated industry (healthcare, finance, government)
  • Multi-Cloud Strategy

    Many large organizations use multiple cloud providers. To avoid lock-in, use open-source tools (MLflow, Kubeflow, ONNX) for the core ML workflow and cloud-specific services only where they provide significant value (e.g., Vertex AI AutoML, SageMaker Spot Instances, Azure AD integration).