Azure Pipelines offers a variety of extensions for integrating Terraform: An official one from Microsoft which has a number of limitations and has not been receiving any updates for years. A very popular one developed by my colleague Charles Zipp, which offers great functionality, but not all enterprises wish to install community extensions for Azure […]
Articles
Managing Terraform outputs in Azure Pipelines
You can use Terraform as a single source of configuration for multiple pipelines. This enables you to centralize configuration across your project, such as your naming strategy for resources. When running terraform apply, the Terraform state (usually a blob in Azure Storage) contains the values of your defined Terraform outputs. In your output.tf: The Azure […]
Provisioning Azure Databricks and PAT tokens with Terraform
See Part 1, Using Azure AD With The Azure Databricks API, for a background on the Azure AD authentication mechanism for Databricks. Here we show how to bootstrap the provisioning of an Azure Databricks workspace and generate a PAT Token that can be used by downstream applications. Create a script generate-pat-token.sh with the following content. […]
Unit testing Databricks notebooks
A simple way to unit test notebooks is to write the logic in a notebook that accepts parameterized inputs, and a separate test notebook that contains assertions. The sample project https://github.com/algattik/databricks-unit-tests/ contains two demonstration notebooks: The normalize_orders notebook processes a list of Orders and a list of OrderDetails into a joined list, taking into account […]
Exploring stream processing with Flink on Kubernetes
(updated 2019-11-18 with streaming-at-scale repository link) Apache Flink is a popular engine for distributed stream processing. In contrast to Spark Structured Streaming which processes streams as microbatches, Flink is a pure streaming engine where messages are processed one at a time. Running Flink in a modern cloud deployment on Azure poses some challenges. Flink can […]
Using the TensorFlow Object Detection API on Azure Databricks
The easiest way to train an Object Detection model is to use the Azure Custom Vision cognitive service. That said, the Custom Vision service is optimized to quickly recognize major differences between images, which means it can be trained with small datasets, but is not optimized for detecting subtle differences in images (for example, detecting […]
PaaS integration testing with Azure DevOps
Using Azure DevOps pipelines, we can easily spin test environments to run various sorts of integration tests on PaaS resources. Azure DevOps allows powerful scripting and orchestration using familiar CLI commands, and is very useful to automatically spin entire environments using Infrastructure as Code without manual intervention. Sample project In this example, we looked at […]
DevOps in Azure with Databricks and Data Factory
Building simple deployment pipelines to synchronize Databricks notebooks across environments is easy, and such a pipeline could fit the needs of small teams working on simple projects. Yet, a more sophisticated application includes other types of resources that need to be provisioned in concert and securely connected, such as Data Factory pipeline, storage accounts and […]
Embedding Power BI content with a Service Principal
Until now, embedding Power BI reports or dashboards into a web application or automating processes with the Power BI API required a master account. A master account is an actual Power BI account with a username and password that the embedding app uses to connect to the Power BI API. The master account that is […]
Managing cost in development subscriptions
The cloud allows development and test teams to be very agile, as they can spin the resources they need in a matter of minutes, whether for quick prototyping, learning, or scalability tests. That can come with headaches if costs are left to spiral out of control. We will investigate what reactive and proactive controls can […]