Downloading a collection of JARs transitively

Sometimes you need to download a collection of JAR files, as well as their dependencies, for example to provide to a PySpark job. One way for this is to use Maven.

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
  http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.example</groupId>
  <artifactId>spark-dependencies</artifactId>
  <version>0.0.1</version>
  <dependencies>
    <dependency>
      <groupId>com.microsoft.azure</groupId>
      <artifactId>azure-eventhubs-spark_2.11</artifactId>
      <version>2.3.13</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
      <version>2.4.3</version>
    </dependency>
  </dependencies>
</project>

Running this:

$ mvn -f ./pom.xml dependency:copy-dependencies

Will download the specified JARs and their transitive dependencies into the target folder:

$ ls target/dependency/

azure-eventhubs-2.3.2.jar             lz4-java-1.4.1.jar                    scala-java8-compat_2.11-0.9.0.jar     snappy-java-1.1.7.1.jar               unused-1.0.0.jar
azure-eventhubs-spark_2.11-2.3.13.jar proton-j-0.31.0.jar                   scala-library-2.11.12.jar             spark-sql-kafka-0-10_2.11-2.4.3.jar
kafka-clients-2.0.0.jar               qpid-proton-j-extensions-1.2.0.jar    slf4j-api-1.7.25.jar                  spark-tags_2.11-2.4.3.jar


Software Engineer at Microsoft, Data & AI, open source fan