Snippets

Using Azure AD with the Azure Databricks API

In the past, the Azure Databricks API has required a Personal Access Token (PAT), which must be manually generated in the UI. This complicates DevOps scenarios. A new feature in preview allows using Azure AD to authenticate with the API. You can use it in two ways:

  • Use Azure AD to authenticate each Azure Databricks REST API call.
  • Use Azure AD to create a PAT token, and then use this PAT token with the Databricks REST API. Note that there is a quota limit of 600 active tokens.

See further down for options using Python or Terraform.

Ensure your service principal has Contributor permissions on the Databricks workspace resource.

Option 1 – using Azure CLI

The easiest way is to use Azure CLI. Log in to Azure with a user account or service principal that has Contributor permissions on the workspace.

# Change these values
RESOURCE_GROUP=my-resource-group
DATABRICKS_WORKSPACE=my-databricks-workspace

tenantId=$(az account show --query tenantId -o tsv)
wsId=$(az resource show \
  --resource-type Microsoft.Databricks/workspaces \
  -g "$RESOURCE_GROUP" \
  -n "$DATABRICKS_WORKSPACE" \
  --query id -o tsv)

# Get a token for the global Databricks application.
# The resource name is fixed and never changes.
token_response=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d)
token=$(jq .accessToken -r <<< "$token_response")

# Get a token for the Azure management API
token_response=$(az account get-access-token --resource https://management.core.windows.net/)
azToken=$(jq .accessToken -r <<< "$token_response")

# Use both tokens in Databricks API call
curl -sf https://northeurope.azuredatabricks.net/api/2.0/clusters/list \
  -H "Authorization: Bearer $token" \
  -H "X-Databricks-Azure-SP-Management-Token:$azToken" \
  -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId"

# You can also generate a PAT token. Note the quota limit of 600 tokens.
api_response=$(curl -sf https://northeurope.azuredatabricks.net/api/2.0/token/create \
  -H "Authorization: Bearer $token" \
  -H "X-Databricks-Azure-SP-Management-Token:$azToken" \
  -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" \
  -d '{ "lifetime_seconds": 100, "comment": "this is an example token" }')
export DATABRICKS_TOKEN=$(jq .token_value -r <<< "$api_response")

Databricks CLI will use the DATABRICKS_TOKEN and DATABRICKS_HOST environment variables as configuration.

Option 2 – using cURL

If you do not wish to use the Azure CLI, you can also use REST queries with cURL directly.

# Change these values.
# Use a Client ID with Contributor permissions
#   on the Databricks workspace.
RESOURCE_GROUP=my-resource-group
DATABRICKS_WORKSPACE=my-databricks-workspace
CLIENT_ID=my-client-id
CLIENT_SECRET=my-client-secret

tenantId=$(az account show --query tenantId -o tsv)
wsId=$(az resource show \
  --resource-type Microsoft.Databricks/workspaces \
  -g "$RESOURCE_GROUP" \
  -n "$DATABRICKS_WORKSPACE" \
  --query id -o tsv)

getToken () {
  token_response=$(curl -X GET \
    https://login.microsoftonline.com/$tenantId/oauth2/token \
    -H 'Content-Type: application/x-www-form-urlencoded' \
    -d "grant_type=client_credentials&client_id=$CLIENT_ID&resource=$1&client_secret=$CLIENT_SECRET"
  )
  jq .access_token -r <<< "$token_response"
}

# Get a token for the global Databricks application. This value is fixed and never changes.
token=$(getToken 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d)

# Get a token for the Azure management API
azToken=$(getToken https://management.core.windows.net/)

# Use both tokens in Databricks API call
curl -sf https://northeurope.azuredatabricks.net/api/2.0/clusters/list \
  -H "Authorization: Bearer $token" \
  -H "X-Databricks-Azure-SP-Management-Token:$azToken" \
  -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId"

# You can also generate a PAT token. Note the quota limit of 600 tokens.
curl -sf https://northeurope.azuredatabricks.net/api/2.0/token/create \
  -H "Authorization: Bearer $token" \
  -H "X-Databricks-Azure-SP-Management-Token:$azToken" \
  -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" \
  -d '{ "lifetime_seconds": 100, "comment": "this is an example token" }'

Option 3 – Using Python

I have published a Python module to easily interact with Databricks with PAT tokens or AAD: https://pypi.org/project/databricks-client/

If you want to implement the logic yourself, the easiest way is to use the azure-core module to access Azure CLI credentials from Python.

    import requests
    from azure.common.credentials import get_azure_cli_credentials
    resource_group = "MY_RESOURCE_GROUP"
    databricks_workspace = "MY_WORKSPACE"
    dbricks_location = "northeurope"

    credentials, subscription_id = get_azure_cli_credentials()
    dbricks_api = f"https://{dbricks_location}.azuredatabricks.net/api/2.0"
    # Get a token for the global Databricks application. This value is fixed and never changes.
    adbToken = credentials.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d").token
    # Get a token for the Azure management API
    azToken = credentials.get_token("https://management.core.windows.net/").token
    dbricks_auth = {
        "Authorization": f"Bearer {adbToken}",
        "X-Databricks-Azure-SP-Management-Token": azToken,
        "X-Databricks-Azure-Workspace-Resource-Id": (
            f"/subscriptions/{subscription_id}"
            f"/resourceGroups/{resource_group}"
            f"/providers/Microsoft.Databricks"
            f"/workspaces/{databricks_workspace}")}
    requests.get(f"{dbricks_api}/instance-pools/list", headers= dbricks_auth).json()

See Part 2, Provisioning Azure Databricks and PAT tokens with Terraform, for a Terraform template which fully automates the provisioning process.

Alexandre Gattiker
Software Engineer at Microsoft, Data & AI, open source fan
https://cloudarchitected.com

One thought on “Using Azure AD with the Azure Databricks API

Leave a Reply

Your email address will not be published. Required fields are marked *