Articles Tutorial

Unit testing Databricks notebooks

A simple way to unit test notebooks is to write the logic in a notebook that accepts parameterized inputs, and a separate test notebook that contains assertions. The sample project https://github.com/algattik/databricks-unit-tests/ contains two demonstration notebooks: The normalize_orders notebook processes a list of Orders and a list of OrderDetails into a joined list, taking into account […]

Articles

Exploring stream processing with Flink on Kubernetes

(updated 2019-11-18 with streaming-at-scale repository link) Apache Flink is a popular engine for distributed stream processing. In contrast to Spark Structured Streaming which processes streams as microbatches, Flink is a pure streaming engine where messages are processed one at a time. Running Flink in a modern cloud deployment on Azure poses some challenges. Flink can […]

Articles Featured Highlighted

DevOps in Azure with Databricks and Data Factory

Building simple deployment pipelines to synchronize Databricks notebooks across environments is easy, and such a pipeline could fit the needs of small teams working on simple projects. Yet, a more sophisticated application includes other types of resources that need to be provisioned in concert and securely connected, such as Data Factory pipeline, storage accounts and […]