Using Azure DevOps pipelines, we can easily spin test environments to run various sorts of integration tests on PaaS resources. Azure DevOps allows powerful scripting and orchestration using familiar CLI commands, and is very useful to automatically spin entire environments using Infrastructure as Code without manual intervention.
Table of Contents
In this example, we looked at an open-source project Spline (a Data Lineage Tracking and Visualization tool for Apache Spark). Data lineage and governance is a priority topic in many enterprises, and together with my colleague Arvind Shyamsundar, we wanted to evaluate the complexity and benefits of integrating this project into Spark environments.
The Spline project has several components:
- A library that runs on Spark and captures data lineage information
- A persistence layer that stores the data on MongoDB, HDFS or Atlas
- A Web UI application that visualizes the stored data lineages (supports MongoDB)
We wanted to verify that the MongoDB backend could be swapped by Cosmos DB. As a fully managed, elastic SLA-backed service, Cosmos DB is much easier to manage and integrate than running MongoDB on Virtual Machines. Since Cosmos DB provides wire protocol compatibility with Mongo DB, we expected to be able to plug in Cosmos DB without any changes needed to Spline’s code or libraries.
Since Spline is under active development, we decided to automate testing upfront so that we would be able to test various branches and configurations and retest periodically in the future, rather than testing on a manually deployed environment. Also, we wanted to make the build result publicly available in order to easily share test failures and diagnostic information with the Spline development team.
You can use this patterns to test any software, yours or third-party, in combination with required services and with different configurations.
The pipeline definition script is azure-pipelines.yml. We first define some variables, including the resource group where we will spin Cosmos DB and a name prefix for Cosmos DB. The name must be unique, and we will generate a unique name for each build to avoid any side effects. We also define the Git branch of the Spline project that we want to test.
We define the type of virtual machine on which the build will run. This is a Microsoft managed VM pool.
The build tasks first create a Cosmos DB instance. Behind the scene, it uses service principals to have access to an Azure subcription. To set up this connection, Navigate to Project settings > Service connections. Create a new service connection of type Azure Resource Manager. Name the connection ARMConnection. Select the subscription. Leave the resource group blank and click OK.
- task: AzureCLI@1
displayName: Create Azure resources
The next tasks run Maven tasks to build the Spline source code and run its unit tests against Cosmos DB, then publish the Spline artifacts (Web UI) into a downloadable artifact.
- task: Maven@3
displayName: Build Spline
- task: Maven@3
displayName: Run Spline tests
- task: CopyFiles@2
displayName: Copy web app to artifacts directory
- task: PublishBuildArtifacts@1
displayName: Publish web app artifacts
The last step of the build pipeline deletes the Cosmos DB instance. It is configured to run always so that Cosmos DB will be deleted even if tests fail.
- task: AzureCLI@1
displayName: Delete Azure resources
The first script run by the pipeline, before-build.sh , creates a Cosmos DB instance with the configuration required by Spline. Spline requires the Mongo DB API, and uses the aggregation pipeline feature. As that feature is in preview, it must be explicitly enabled. It then gets the Cosmos DB connection string (in Mongo DB format) and passes it to the pipeline using the special ##vso syntax. The variable is passed by the Maven test task to the unit test process.
# Create a Cosmos DB database.
az cosmosdb create -g $RESOURCE_GROUP -n $COSMOSDB_INSTANCE --kind MongoDB --capabilities EnableAggregationPipeline -o table
# Get the connection string (in mongodb:// format) to the Cosmos DB account.
# The connection string contains the account key.
# Example connection string:
COSMOSDB_CONN_STRING=$(az cosmosdb list-connection-strings -g $RESOURCE_GROUP -n $COSMOSDB_INSTANCE --query connectionStrings.connectionString -o tsv)
# Set job variable from script, to be used by Maven task
echo "##vso[task.setvariable variable=COSMOSDB_CONN_STRING]$COSMOSDB_CONN_STRING"
The final script, after-build.sh, deletes the Cosmos DB database, so that no costs are incurred after the end of the tests.
# Delete the test Cosmos DB database.
az cosmosdb delete -g $RESOURCE_GROUP -n $COSMOSDB_INSTANCE
Our first run of the build pipeline failed with test failures:
An investigation showed that the issue was due to a different implementation of regular expressions between native Mongo DB and Cosmos DB. Spline issues database searches using the “\Q…\E” syntax to mark text as to be searched literally, which is not supported by Cosmos DB that implements the ECMAScript standard. disable regular expression.
Using Azure DevOps pipelines as a framework for deploying and testing resources is very light and powerful, even when testing third-party software. This approach can be used for lightweight integration tests, but also for complex testing scenarios, load and performance testing.
Now that we have a verified custom build, let us spin an entire environment with Spark and Spline running in an Azure App Service! (blog post coming soon)