Google Cloud provides several Deep Learning images which are pre-configured with key ML frameworks and tools, and that can be run out of the box. We will write a bash script for setting up an instance with
Note that the resulting instance will have many more packages pre-installed.
I will assume the Cloud SDK is already installed and set up on your system.
#!/usr/bin/env bash
export IMAGE_FAMILY='tf-latest-cu92'
export ZONE='us-central1-c'
export INSTANCE_NAME='tf-n1-highmem-2-k80-count-1'
export INSTANCE_TYPE='n1-highmem-2'
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator='type=nvidia-tesla-k80,count=1' \
--machine-type=$INSTANCE_TYPE \
--boot-disk-size=50GB \
--metadata='install-nvidia-driver=True'
INSTANCE_NAME will be the name of your VM. It will be also displayed on the Google Cloud Platform. I chose a name which will remind of the specifications chosen. You can choose whatever name you like, as long as it there is no other VM/instance with the same name.
--zone: You can pick which ever zone suits you best from the following list. However, note that not every zone provides GPUs or that some zones provide only a subset of all availble GPU types. For more info which GPU can be found on which zone have a look here. Also note that the prices might vary from zone to zone. I picked us-central1-c simply because it provides all GPU types and belongs to the cheapest zones.
--image-family: tf-latest-cu92 installs the latest Tensorflow GPU version. You can specify a specific version as well: tf-VERSION-cu92 (e.g. tf-1-8-cu92). Other possible images are listed up here.
--image-project: this must be deeplearning-platform-release
--maintenance-policy: must be TERMINATE
--accelerator: here we specify the GPU type to use. The format is 'type=TYPE,count=COUNT'. More info regarding available types here.
--machine-type: here we specify the CPU and memory. There are standard machine types and high-memory machine types. All possible types can be found here.
--metadata: here we specify that the NVIDIA driver should be installed via install-nvidia-driver=True. With this flag it may take up to 5 min until the VM is fully provisioned.
We give the script the name create-gcloud-tf-instance.sh, place it under ~/bin (might need to create it since it is not there by default) and add the following lines to our .bash_profile:
# Add bin directory to path
export PATH=~/bin:"$PATH"
After restarting the terminal (or running source .bash_profile) we need to update the file permissions accordingly. This can be done from within the terminal via:
$ chmod 700 bin/create-gcloud-tf-instance.sh
Since we just added the bin folder to our path, we can now execute our new command just like any other built-in command (tab completion should also work here):
$ create-gcloud-tf-instance.sh
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/deeprl-1/zones/us-central1-c/instances/blog].
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
tf-n1-highmem-2-k80-count-1 us-central1-c n1-highmem-2 10.128.0.3 35.239.61.170 RUNNING
Updates are available for some Cloud SDK components. To install them,
please run:
$ gcloud components update
Opening your cloud console in the browser https://console.cloud.google.com/, you will find your newly created instance.
The instance can be stopped form within the terminal via the gcloud SDK command
$ gcloud compute instances stop tf-n1-highmem-2-k80-count-1 --zone=us-central1-c
Stopping instance(s) tf-n1-highmem-2-k80-count-1...⠏
Analoguously, replacing stop with start will boot your instance.
In order to use JupyterLab within the browser of your local PC, we will create a SSH tunnel that forwards port 8080 from your gcloud VM to port 8080 of your local PC:
$ gcloud compute ssh tf-n1-highmem-2-k80-count-1 --zone=us-central1-c -- -L 8080:localhost:8080
Enter passphrase for key '/Users/apoehlmann/.ssh/google_compute_engine':
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
======================================
Welcome to the Google Deep Learning VM
======================================
Based on: Debian GNU/Linux 9.5 (stretch) (GNU/Linux 4.9.0-8-amd64 x86_64\n)
Resources:
* Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform
* Google Cloud Documentation: https://cloud.google.com/deep-learning-vm
* Google Group: https://groups.google.com/forum/#!forum/google-dl-platform
TensorFlow comes pre-installed with this image. To install TensorFlow binaries in a virtualenv (or conda env),
please use the binaries that are pre-built for this image. You can find the binaries at
/opt/deeplearning/binaries/tensorflow/
Note that public TensorFlow binaries may not work with this image.
Linux tf-n1-highmem-2-k80-count-1 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
apoehlmann@tf-n1-highmem-2-k80-count-1:~$
Note: the '--' argument must be specified between gcloud specific args on the left and SSH_ARGS on the right. For more info check the docs.
In your browser, you can now open JupyterLab via http://localhost:8080. It will automatically redirect to your VM's JupyterLab.