Google Cloud provides several Deep Learning images which are pre-configured with key ML frameworks and tools, and that can be run out of the box. We will write a bash script for setting up an instance with
Note that the resulting instance will have many more packages pre-installed.
I will assume the Cloud SDK is already installed and set up on your system.
#!/usr/bin/env bash export IMAGE_FAMILY='tf-latest-cu92' export ZONE='us-central1-c' export INSTANCE_NAME='tf-n1-highmem-2-k80-count-1' export INSTANCE_TYPE='n1-highmem-2' gcloud compute instances create $INSTANCE_NAME \ --zone=$ZONE \ --image-family=$IMAGE_FAMILY \ --image-project=deeplearning-platform-release \ --maintenance-policy=TERMINATE \ --accelerator='type=nvidia-tesla-k80,count=1' \ --machine-type=$INSTANCE_TYPE \ --boot-disk-size=50GB \ --metadata='install-nvidia-driver=True'
INSTANCE_NAME will be the name of your VM. It will be also displayed on the Google Cloud Platform. I chose a name which will remind of the specifications chosen. You can choose whatever name you like, as long as it there is no other VM/instance with the same name.
--zone: You can pick which ever zone suits you best from the following list. However, note that not every zone provides GPUs or that some zones provide only a subset of all availble GPU types. For more info which GPU can be found on which zone have a look here. Also note that the prices might vary from zone to zone. I picked us-central1-c simply because it provides all GPU types and belongs to the cheapest zones.
--image-family: tf-latest-cu92 installs the latest Tensorflow GPU version. You can specify a specific version as well: tf-VERSION-cu92 (e.g. tf-1-8-cu92). Other possible images are listed up here.
--image-project: this must be deeplearning-platform-release
--maintenance-policy: must be TERMINATE
--accelerator: here we specify the GPU type to use. The format is 'type=TYPE,count=COUNT'. More info regarding available types here.
--machine-type: here we specify the CPU and memory. There are standard machine types and high-memory machine types. All possible types can be found here.
--metadata: here we specify that the NVIDIA driver should be installed via install-nvidia-driver=True. With this flag it may take up to 5 min until the VM is fully provisioned.
We give the script the name create-gcloud-tf-instance.sh, place it under ~/bin (might need to create it since it is not there by default) and add the following lines to our .bash_profile:
# Add bin directory to path export PATH=~/bin:"$PATH"
After restarting the terminal (or running source .bash_profile) we need to update the file permissions accordingly. This can be done from within the terminal via:
$ chmod 700 bin/create-gcloud-tf-instance.sh
Since we just added the bin folder to our path, we can now execute our new command just like any other built-in command (tab completion should also work here):
$ create-gcloud-tf-instance.sh WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance. Created [https://www.googleapis.com/compute/v1/projects/deeprl-1/zones/us-central1-c/instances/blog]. NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS tf-n1-highmem-2-k80-count-1 us-central1-c n1-highmem-2 10.128.0.3 220.127.116.11 RUNNING Updates are available for some Cloud SDK components. To install them, please run: $ gcloud components update
Opening your cloud console in the browser https://console.cloud.google.com/, you will find your newly created instance.
The instance can be stopped form within the terminal via the gcloud SDK command
$ gcloud compute instances stop tf-n1-highmem-2-k80-count-1 --zone=us-central1-c Stopping instance(s) tf-n1-highmem-2-k80-count-1...⠏
Analoguously, replacing stop with start will boot your instance.
In order to use JupyterLab within the browser of your local PC, we will create a SSH tunnel that forwards port 8080 from your gcloud VM to port 8080 of your local PC:
$ gcloud compute ssh tf-n1-highmem-2-k80-count-1 --zone=us-central1-c -- -L 8080:localhost:8080 Enter passphrase for key '/Users/apoehlmann/.ssh/google_compute_engine': Warning: untrusted X11 forwarding setup failed: xauth key data not generated ====================================== Welcome to the Google Deep Learning VM ====================================== Based on: Debian GNU/Linux 9.5 (stretch) (GNU/Linux 4.9.0-8-amd64 x86_64\n) Resources: * Google Deep Learning Platform StackOverflow: https://stackoverflow.com/questions/tagged/google-dl-platform * Google Cloud Documentation: https://cloud.google.com/deep-learning-vm * Google Group: https://groups.google.com/forum/#!forum/google-dl-platform TensorFlow comes pre-installed with this image. To install TensorFlow binaries in a virtualenv (or conda env), please use the binaries that are pre-built for this image. You can find the binaries at /opt/deeplearning/binaries/tensorflow/ Note that public TensorFlow binaries may not work with this image. Linux tf-n1-highmem-2-k80-count-1 4.9.0-8-amd64 #1 SMP Debian 4.9.110-3+deb9u4 (2018-08-21) x86_64 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. apoehlmann@tf-n1-highmem-2-k80-count-1:~$
Note: the '--' argument must be specified between gcloud specific args on the left and SSH_ARGS on the right. For more info check the docs.
In your browser, you can now open JupyterLab via http://localhost:8080. It will automatically redirect to your VM's JupyterLab.