Articles Featured Uncategorized

Managing Terraform outputs in Azure Pipelines

You can use Terraform as a single source of configuration for multiple pipelines. This enables you to centralize configuration across your project, such as your naming strategy for resources. When running terraform apply, the Terraform state (usually a blob in Azure Storage) contains the values of your defined Terraform outputs. In your output.tf: The Azure […]

Articles Tutorial

Unit testing Databricks notebooks

A simple way to unit test notebooks is to write the logic in a notebook that accepts parameterized inputs, and a separate test notebook that contains assertions. The sample project https://github.com/algattik/databricks-unit-tests/ contains two demonstration notebooks: The normalize_orders notebook processes a list of Orders and a list of OrderDetails into a joined list, taking into account […]

Articles

Exploring stream processing with Flink on Kubernetes

(updated 2019-11-18 with streaming-at-scale repository link) Apache Flink is a popular engine for distributed stream processing. In contrast to Spark Structured Streaming which processes streams as microbatches, Flink is a pure streaming engine where messages are processed one at a time. Running Flink in a modern cloud deployment on Azure poses some challenges. Flink can […]

Articles Featured Highlighted

DevOps in Azure with Databricks and Data Factory

Building simple deployment pipelines to synchronize Databricks notebooks across environments is easy, and such a pipeline could fit the needs of small teams working on simple projects. Yet, a more sophisticated application includes other types of resources that need to be provisioned in concert and securely connected, such as Data Factory pipeline, storage accounts and […]