Sometimes you need to download a collection of JAR files, as well as their dependencies, for example to provide to a PySpark job. One way for this is to use Maven.
<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>spark-dependencies</artifactId>
<version>0.0.1</version>
<dependencies>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs-spark_2.11</artifactId>
<version>2.3.13</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.4.3</version>
</dependency>
</dependencies>
</project>
Running this:
$ mvn -f ./pom.xml dependency:copy-dependencies
Will download the specified JARs and their transitive dependencies into the target
folder:
$ ls target/dependency/
azure-eventhubs-2.3.2.jar lz4-java-1.4.1.jar scala-java8-compat_2.11-0.9.0.jar snappy-java-1.1.7.1.jar unused-1.0.0.jar
azure-eventhubs-spark_2.11-2.3.13.jar proton-j-0.31.0.jar scala-library-2.11.12.jar spark-sql-kafka-0-10_2.11-2.4.3.jar
kafka-clients-2.0.0.jar qpid-proton-j-extensions-1.2.0.jar slf4j-api-1.7.25.jar spark-tags_2.11-2.4.3.jar