Setting up Google Cloud's Deep Learning VM with GPU from the Command Line on macOS

Introduction

Google Cloud provides several Deep Learning images which are pre-configured with key ML frameworks and tools, and that can be run out of the box. We will write a bash script for setting up an instance with

2x CPUs with 13GB RAM
1x Nvidia Tesla K80 GPU
50GB disk size
TensorFlow GPU

Note that the resulting instance will have many more packages pre-installed.

I will assume the Cloud SDK is already installed and set up on your system.

Bash script

#!/usr/bin/env bash
export IMAGE_FAMILY='tf-latest-cu92'
export ZONE='us-central1-c'
export INSTANCE_NAME='tf-n1-highmem-2-k80-count-1'
export INSTANCE_TYPE='n1-highmem-2'
gcloud compute instances create $INSTANCE_NAME \
        --zone=$ZONE \
        --image-family=$IMAGE_FAMILY \
        --image-project=deeplearning-platform-release \
        --maintenance-policy=TERMINATE \
        --accelerator='type=nvidia-tesla-k80,count=1' \
        --machine-type=$INSTANCE_TYPE \
        --boot-disk-size=50GB \
        --metadata='install-nvidia-driver=True'

INSTANCE_NAME will be the name of your VM. It will be also displayed on the Google Cloud Platform. I chose a name which will remind of the specifications chosen. You can choose whatever name you like, as long as it there is no other VM/instance with the same name.

--zone: You can pick which ever zone suits you best from the following list. However, note that not every zone provides GPUs or that some zones provide only a subset of all availble GPU types. For more info which GPU can be found on which zone have a look here. Also note that the prices might vary from zone to zone. I picked us-central1-c simply because it provides all GPU types and belongs to the cheapest zones.

--image-family: tf-latest-cu92 installs the latest Tensorflow GPU version. You can specify a specific version as well: tf-VERSION-cu92 (e.g. tf-1-8-cu92). Other possible images are listed up here.

--image-project: this must be deeplearning-platform-release

--maintenance-policy: must be TERMINATE

--accelerator: here we specify the GPU type to use. The format is 'type=TYPE,count=COUNT'. More info regarding available types here.

--machine-type: here we specify the CPU and memory. There are standard machine types and high-memory machine types. All possible types can be found here.

--metadata: here we specify that the NVIDIA driver should be installed via install-nvidia-driver=True. With this flag it may take up to 5 min until the VM is fully provisioned.

Executing our new command

We give the script the name create-gcloud-tf-instance.sh, place it under ~/bin (might need to create it since it is not there by default) and add the following lines to our .bash_profile:

# Add bin directory to path
export PATH=~/bin:"$PATH"

After restarting the terminal (or running source .bash_profile) we need to update the file permissions accordingly. This can be done from within the terminal via:

$ chmod 700 bin/create-gcloud-tf-instance.sh

Since we just added the bin folder to our path, we can now execute our new command just like any other built-in command (tab completion should also work here):

$ create-gcloud-tf-instance.sh
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/deeprl-1/zones/us-central1-c/instances/blog].
NAME                         ZONE           MACHINE_TYPE  PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
tf-n1-highmem-2-k80-count-1  us-central1-c  n1-highmem-2               10.128.0.3   35.239.61.170  RUNNING


Updates are available for some Cloud SDK components.  To install them,
please run:
  $ gcloud components update

Opening your cloud console in the browser https://console.cloud.google.com/, you will find your newly created instance.

The instance can be stopped form within the terminal via the gcloud SDK command

$ gcloud compute instances stop tf-n1-highmem-2-k80-count-1 --zone=us-central1-c 
Stopping instance(s) tf-n1-highmem-2-k80-count-1...⠏

Analoguously, replacing stop with start will boot your instance.

Port-forwarding JupyterLab

In order to use JupyterLab within the browser of your local PC, we will create a SSH tunnel that forwards port 8080 from your gcloud VM to port 8080 of your local PC:

$ gcloud compute ssh tf-n1-highmem-2-k80-count-1 --zone=us-central1-c -- -L 8080:localhost:8080
Enter passphrase for key '/Users/apoehlmann/.ssh/google_compute_engine': 
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
======================================
Welcome to the Google Deep Learning VM
======================================

Based on: Debian GNU/Linux 9.5 (stretch) (GNU/Linux 4.9.0-8-amd64 x86_64\n)

Resources:
 * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
 * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
 * Google Group: https://groups.google.com/forum/#!forum/google-dl-platform

TensorFlow comes pre-installed with this image. To install TensorFlow binaries in a virtualenv (or conda env),
please use the binaries that are pre-built for this image. You can find the binaries at
/opt/deeplearning/binaries/tensorflow/
Note that public TensorFlow binaries may not work with this image.

Linux tf-n1-highmem-2-k80-count-1 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
apoehlmann@tf-n1-highmem-2-k80-count-1:~$

Note: the '--' argument must be specified between gcloud specific args on the left and SSH_ARGS on the right. For more info check the docs.

In your browser, you can now open JupyterLab via http://localhost:8080. It will automatically redirect to your VM's JupyterLab.

Getagged mit:
gcloud English