Snippets

Accessing Azure Data Lake Storage Gen2 from clients

Azure Data Lake Storage Gen2 can be easily accessed from the command line or from applications on HDInsight or Databricks. If you are developing an application on another platform, you can use the driver provided in Hadoop as of release 3.2.0 in the command line or as a Java SDK.

Using the Hadoop File System Shell

To access ADLS Gen2 from the command line, download and unpack the Hadoop 3.2.0 tar.gz from https://hadoop.apache.org/release/3.2.0.html.

You can now use the hdfs command in a Bash shell:

export HADOOP_OPTIONAL_TOOLS=hadoop-azure

bin/hdfs dfs -Dfs.azure.account.key.ACCOUNTNAME.dfs.core.windows.net=ACCOUNTKEY \
-ls abfss://FILESYSTEM@ACCOUNTNAME.dfs.core.windows.net/

You can use all hdfs commands such as -ls, -cp, -get etc.

Note: putting the storage key in the command line is insecure as other users on the machine can see it. You can put the storage key in core-site.xml instead.

Using the JDK

Including the Hadoop libraries

Add to your pom.xml:

<properties>
<hadoop.version>3.2.0</hadoop.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-azure</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>

Using a Storage Account Key

Use the following Java code:

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "abfss://FILESYSTEM@ACCOUNTNAME.dfs.core.windows.net/");
conf.set("fs.azure.account.key.ACCOUNTNAME.dfs.core.windows.net", "ACCOUNTKEY");

FileSystem fs = FileSystem.get(conf);
FileStatus[] files = fs.listStatus(new Path("/"));
for (FileStatus f : files) {
System.out.println(f);
}

Using a Service principal

Use the same code as above, but with the following configuration:

conf.set("fs.defaultFS", "abfss://FILESYSTEM@ACCOUNTNAME.dfs.core.windows.net/");
conf.set("fs.azure.account.auth.type", "OAuth");
conf.set("fs.azure.account.oauth.provider.type",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider");
conf.set("fs.azure.account.oauth2.client.id", CLIENT_ID);
conf.set("fs.azure.account.oauth2.client.secret", CLIENT_SECRET);
conf.set("fs.azure.account.oauth2.client.endpoint",
"https://login.microsoftonline.com/" + TENANT_ID + "/oauth2/token");

Using OAuth2 and the user’s identity

Also add to your pom.xml:

    <dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>adal4j</artifactId>
<version>1.6.3</version>
</dependency>

Use the following Java code:

private final static String AUTHORITY = "https://login.microsoftonline.com/common/";
private final static String RESOURCE = "https://storage.azure.com/";
// Well-known client ID for Azure CLI. Ideally, you should generate a Client ID
// for your application and replace it here.
private final static String CLIENT_ID = "04b07795-8ddb-461a-bbee-02f9e1bf7b46";
// Tenant ID - In the Azure Portal: Azure Active Directory > Properties > Directory ID
private final static String TENANT_ID = "72f988bf-86f1-41af-91ab-2d7cd011db47";
AuthenticationResult accessToken = getAccessTokenUsingDeviceCodeFlow();
String refreshToken = accessToken.getRefreshToken();

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "abfss://FILESYSTEM@ACCOUNTNAME.dfs.core.windows.net/");
conf.set("fs.azure.account.auth.type", "OAuth");
conf.set("fs.azure.account.oauth.provider.type",
"org.apache.hadoop.fs.azurebfs.oauth2.RefreshTokenBasedTokenProvider");
conf.set("fs.azure.account.oauth2.client.id", CLIENT_ID);
conf.set("fs.azure.account.oauth2.client.endpoint",
"https://login.microsoftonline.com/" + TENANT_ID + "/oauth2/token");
conf.set("fs.azure.account.oauth2.refresh.token", refreshToken);

FileSystem fs = FileSystem.get(conf);
FileStatus[] files = fs.listStatus(new Path("/"));
for (FileStatus f : files) {
System.out.println(f);
}

This code will show a message in the console similar to:

To sign in, use a web browser to open the page https://microsoft.com/devicelogin
and enter the code DYB61DF1G to authenticate.

The code will wait for the user to login at the device login page, for the duration of validity of the device code (by default 15 minutes). If your application is not console-based you will need to find another way to get the message displayed to the user.

private AuthenticationResult getAccessTokenUsingDeviceCodeFlow() {
AuthenticationResult result = null;
ExecutorService service = null;
ExecutionException exception;
try {
service = Executors.newFixedThreadPool(1);
AuthenticationContext context =
new AuthenticationContext(AUTHORITY, true, service);

Future<DeviceCode> future =
context.acquireDeviceCode(CLIENT_ID, RESOURCE, null);
DeviceCode deviceCode = future.get();
long expiration = System.currentTimeMillis() +
deviceCode.getExpiresIn() * 1000;
System.out.println(deviceCode.getMessage());
do {
try {
Future<AuthenticationResult> futureResult =
context.acquireTokenByDeviceCode(deviceCode, null);
return futureResult.get();
} catch (ExecutionException ee) {
exception = ee;
Thread.sleep(1000);
}
} while (result == null && System.currentTimeMillis() < expiration);
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
} catch (MalformedURLException e) {
throw new RuntimeException(e);
} finally {
service.shutdown();
}
throw new RuntimeException("Authentication result not received", exception);
}
Alexandre Gattiker
Software Engineer at Microsoft, Data & AI, open source fan
https://cloudarchitected.com

Leave a Reply

Your email address will not be published. Required fields are marked *