In the past, the Azure Databricks API has required a Personal Access Token (PAT), which must be manually generated in the UI. This complicates DevOps scenarios. A new feature in preview allows using Azure AD to authenticate with the API. You can use it in two ways:
- Use Azure AD to authenticate each Azure Databricks REST API call.
- Use Azure AD to create a PAT token, and then use this PAT token with the Databricks REST API. Note that there is a quota limit of 600 active tokens.
See further down for options using Python or Terraform.
Ensure your service principal has Contributor permissions on the Databricks workspace resource.
Option 1 – using Azure CLI
The easiest way is to use Azure CLI. Log in to Azure with a user account or service principal that has Contributor permissions on the workspace.
# Change these values RESOURCE_GROUP=my-resource-group DATABRICKS_WORKSPACE=my-databricks-workspace tenantId=$(az account show --query tenantId -o tsv) wsId=$(az resource show \ --resource-type Microsoft.Databricks/workspaces \ -g "$RESOURCE_GROUP" \ -n "$DATABRICKS_WORKSPACE" \ --query id -o tsv) # Get a token for the global Databricks application. # The resource name is fixed and never changes. token_response=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d) token=$(jq .accessToken -r <<< "$token_response") # Get a token for the Azure management API token_response=$(az account get-access-token --resource https://management.core.windows.net/) azToken=$(jq .accessToken -r <<< "$token_response") # Use both tokens in Databricks API call curl -sf https://northeurope.azuredatabricks.net/api/2.0/clusters/list \ -H "Authorization: Bearer $token" \ -H "X-Databricks-Azure-SP-Management-Token:$azToken" \ -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" # You can also generate a PAT token. Note the quota limit of 600 tokens. api_response=$(curl -sf https://northeurope.azuredatabricks.net/api/2.0/token/create \ -H "Authorization: Bearer $token" \ -H "X-Databricks-Azure-SP-Management-Token:$azToken" \ -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" \ -d '{ "lifetime_seconds": 100, "comment": "this is an example token" }') export DATABRICKS_TOKEN=$(jq .token_value -r <<< "$api_response")
Databricks CLI will use the DATABRICKS_TOKEN and DATABRICKS_HOST environment variables as configuration.
Option 2 – using cURL
If you do not wish to use the Azure CLI, you can also use REST queries with cURL directly.
# Change these values. # Use a Client ID with Contributor permissions # on the Databricks workspace. RESOURCE_GROUP=my-resource-group DATABRICKS_WORKSPACE=my-databricks-workspace CLIENT_ID=my-client-id CLIENT_SECRET=my-client-secret tenantId=$(az account show --query tenantId -o tsv) wsId=$(az resource show \ --resource-type Microsoft.Databricks/workspaces \ -g "$RESOURCE_GROUP" \ -n "$DATABRICKS_WORKSPACE" \ --query id -o tsv) getToken () { token_response=$(curl -X GET \ https://login.microsoftonline.com/$tenantId/oauth2/token \ -H 'Content-Type: application/x-www-form-urlencoded' \ -d "grant_type=client_credentials&client_id=$CLIENT_ID&resource=$1&client_secret=$CLIENT_SECRET" ) jq .access_token -r <<< "$token_response" } # Get a token for the global Databricks application. This value is fixed and never changes. token=$(getToken 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d) # Get a token for the Azure management API azToken=$(getToken https://management.core.windows.net/) # Use both tokens in Databricks API call curl -sf https://northeurope.azuredatabricks.net/api/2.0/clusters/list \ -H "Authorization: Bearer $token" \ -H "X-Databricks-Azure-SP-Management-Token:$azToken" \ -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" # You can also generate a PAT token. Note the quota limit of 600 tokens. curl -sf https://northeurope.azuredatabricks.net/api/2.0/token/create \ -H "Authorization: Bearer $token" \ -H "X-Databricks-Azure-SP-Management-Token:$azToken" \ -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" \ -d '{ "lifetime_seconds": 100, "comment": "this is an example token" }'
Option 3 – Using Python
I have published a Python module to easily interact with Databricks with PAT tokens or AAD: https://pypi.org/project/databricks-client/
If you want to implement the logic yourself, the easiest way is to use the azure-core module to access Azure CLI credentials from Python.
import requests from azure.common.credentials import get_azure_cli_credentials resource_group = "MY_RESOURCE_GROUP" databricks_workspace = "MY_WORKSPACE" dbricks_location = "northeurope" credentials, subscription_id = get_azure_cli_credentials() dbricks_api = f"https://{dbricks_location}.azuredatabricks.net/api/2.0" # Get a token for the global Databricks application. This value is fixed and never changes. adbToken = credentials.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d").token # Get a token for the Azure management API azToken = credentials.get_token("https://management.core.windows.net/").token dbricks_auth = { "Authorization": f"Bearer {adbToken}", "X-Databricks-Azure-SP-Management-Token": azToken, "X-Databricks-Azure-Workspace-Resource-Id": ( f"/subscriptions/{subscription_id}" f"/resourceGroups/{resource_group}" f"/providers/Microsoft.Databricks" f"/workspaces/{databricks_workspace}")} requests.get(f"{dbricks_api}/instance-pools/list", headers= dbricks_auth).json()
See Part 2, Provisioning Azure Databricks and PAT tokens with Terraform, for a Terraform template which fully automates the provisioning process.