How to use Keras2 flow_from_directory() with Azure blob storage

Preamble

This tutorial explains every step in detail. I know the struggle of pre-assumed knowledge to well.

  • Create folder for blob on the DSVM with: mkdir ~/mycontainer
  • Mount blob into DSVM with:
sudo blobfuse ~/mycontainer --tmp-path=/mnt/resource/blobfusetmp --config-file=./fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 -o allow_other
  • Use Keras flow_from_directory() with the path pointing to the mounted blob
  • Save models into the same path via callback function

Introduction

In this tutorial I want to show how to connect your Azure Blob storage to your Azure DSVM to use it with your Jupyter notebook and e.g., Keras2.
I struggled to set up this connection to be able to use my uploaded images with the Keras ImageDataGenerator() function. Therefore, I want to show step by step how I achieved this. There might be better and more elegant ways. If you have one, please let me know.

Prerequisite

  • Azure account. If you do not have one, follow this steps
  • An Azure storage account. If you do not have one, follow this steps
  • Images on your blob. I like the Microsoft Azure Storage Explorer to add and distribute file on your Azure storage accounts. The MS Azure Storage Explorer is a visual File explorer linked to your Azure account. Later we want to use the ImageDataGenerator() function. Therefore, please add the files in the folder structure supported. Check this Medium post. EDIT: Be aware that you will find the Azure storage account (image: futter) in the Microsoft Azure Storage Explorer but still have to create a Blob container (image: haende) where the data will be uploaded into. A Blob container can be created with a simple right click on the Storage account. (In the pictures below the names are different than in the description below as it is a edit)
  • An Azure Data science virtual machine (DSVM). If you do not have one follow this steps. If you created the DSVM, you have to set a Network Security Group (NSG). Follow this guide to create a new one. Then you have to set the NSG in your DSVM, otherwise you will not be able to connect to the DSVM with a Jupyter Notebook. See image below:

Why Blob or Standard V2?

As we are aiming to use ImageDateGenerator() function, we will use images. Blob/V2 are generic cloud storages that can hold your images. You cannot “do” anything with this data such as sorting or altering. But it can store quite some images close to your DSVM. This is very useful when you are working on an image classification problem with more data than you could load into RAM, or continuously changing data. Using the Blob/V2 instead your own HDD or cloud storage will reduce your latency by removing the bottleneck of uploading via your provider as the Blob and the DSVM are in the same Network.

Connect to your DSVM

IMPORTANT: A RUNNING DSVM CREATES COSTS . Just quitting e.g., your Jupyter Notebook does not stop your DSVM. It has to be stopped in the Azure Portal!

Install Blobfuse

Now we will follow this guide from Microsoft to install blobfuse.

lsb_release -a 
sudo apt-get install blobfuse
sudo mkdir /mnt/resource/blobfusetmp -p sudo chown <youruser> /mnt/resource/blobfusetmp
accountName myaccount 
accountKey storageaccesskey
containerName mycontainer
mkdir ~/mycontainer
sudo blobfuse ~/mycontainer --tmp-path=/mnt/resource/blobfusetmp --config-file=./fuse_connection.cfg -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 -o allow_other
Each time you stop/start your DSVM in your Azure portal, you will have to mount the blob again. 

Where to find account credentials

The accountname is the accountname you can find in the Azure platform. Go to storage accounts — Then choose your storage you want to mount — Go to “tab” Access key where you find Storage account name and the key.

Now use Jupyter Notebook with the mounted blob drive

Open your azure notebook account and configure on which machine the notebook should run. default is free Tier, but the free tier is not configured for e.g., machine/deep learning. A DSVM is more powerfull, can have a GPU and has most machine/deep learning libraries pre-installed..

IMPORTANT: A RUNNING DSVM CREATES COSTS . Just quitting e.g., your Jupyter Notebook does not stop your DSVM. It has to be stopped in the Azure Portal!

Now lets use the blob in the Jupyter notebook with Keras2

The following code can be found on Git:

mkdir ~/models

Hi, I am a carpenter, electrical engineer and have over 10 years of experience in signal processing, machine- and deep learning. linkedin.com/in/jan-werth