Architecture Archives - Cloud Architected

Architecture

Data Lineage in Azure Databricks with Spline

April 14April 23 Comments(4)

The Spline open-source project can be used to automatically capture data lineage information from Spark jobs, and provide an interactive GUI to search and visualize data lineage information. We provide an Azure DevOps template project that automates the deployment of an end-to-end demo project in your environment, using Azure Databricks, Cosmos DB and Azure App […]

Architecture

Monitoring and Logging in Azure Databricks with Azure Log Analytics and Grafana

April 12July 21 Comments(2)

Connecting Azure Databricks with Log Analytics allows monitoring and tracing each layer within Spark workloads, including the performance and resource usage on the host and JVM, as well as Spark metrics and application-level logging. You can easily test this integration end-to-end by following the accompanying tutorial on Monitoring Azure Databricks with Azure Log Analytics and […]

Architecture

Event-based analytical data processing with Azure Databricks

March 20April 23 Comment(0)

Cloud-native streaming architecture Overview Modern data analytics architectures should embrace the high flexibility required for today’s business environment, where the only certainty for every enterprise is that the ability to harness explosive volumes of data in real time is emerging as a a key source of competitive advantage. Fortunately, cloud platforms allow high scalability and […]

Architecture

Data-level security in Azure Databricks

March 18April 30 Comment(1)

This is part 2 of our series on Databricks security, following Network Isolation for Azure Databricks. The simplest way to provide data level security in Azure Databricks is to use fixed account keys or service principals for accessing data in Blob storage or Data Lake Storage. This grants every user of Databricks cluster access to […]

Architecture

Securing access to shared metastore with Azure Databricks

March 16April 23 Comment(1)

Uses for an external metastore Every Azure Databricks deployment has a central Hive metastore accessible by all clusters to persist table metadata, including table and column names as well as storage location. By default, the metastore is managed by Azure in the shared Databricks control plane. Instead of using the default, you have the option […]

Architecture

Network Isolation for Azure Databricks

February 27June 5 Comments(7)

For the highest level of security in an Azure Databricks deployment, clusters can be deployed in a custom Virtual Network. With the default setup, inbound traffic is locked down, but outbound traffic is unrestricted for ease of use. The network can be configured to restrict outbound traffic. For data science and exploratory environments, it is […]