Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
This feature is currently in alpha. Please contact us if you would like more information.

Orchestrate an HPO Experiment

In this part of the docs, we will walk through how to execute an HPO experiment on a Kubernetes cluster using SigOpt Orchestrate. SigOpt Orchestrate should now be connected to a Kubernetes cluster of your choice.

Set Up Back to Top

If you haven't connected to a cluster yet, you can launch a cluster on AWS, connect to an existing Kubernetes cluster, or connect to an existing, shared K8s cluster.

Then, test whether or not you are connected to a cluster with SigOpt Orchestrate by running:

orchestrate cluster test

SigOpt Orchestrate will output:

Successfully connected to kubernetes cluster: tiny-cluster

If you're using a custom Kubernetes cluster, you will need to install plugins to get the controller image working:

orchestrate cluster install-plugins

SigOpt Orchestrate works when all of the files for your model are located in the same folder. So, please create an example directory (mkdir), and then change directories (cd) into that directory:

mkdir example && cd example

Then auto-generate templates for a Dockerfile and an SigOpt Orchestrate Configuration YAML file

orchestrate init

Next, you will create some files and put them in this example directory.

Dockerfile: Define your model environment Back to Top

For the tutorial, we'll be using a very simple Dockerfile. For instructions on how to specify more requirements see our guide on Dockerfiles. Please copy and paste the following snippet into the autogenerated file named Dockerfile.

FROM orchestrate/python-3.9:0.9.2

RUN pip install --no-cache-dir scipy==1.1.0
RUN pip install --no-cache-dir scikit-learn==0.19.2
RUN pip install --no-cache-dir numpy==1.15.0

COPY . /orchestrate
WORKDIR /orchestrate

Define a Model Back to Top

This code defines a simple SGDClassifier model that measures accuracy classifying labels for the Iris flower dataset. Copy and paste the snippet below to a file titled model.py. Note the snippet below uses SigOpt's Runs to track model attributes.

# model.py

# SGDClassifier example written to run with SigOpt Orchestrate

# You'll use the SigOpt Training Runs API to communicate with SigOpt Orchestrate
# while your model is running on the cluster.
import sigopt


# These packages will need to be installed in order to run your model.
# To do this, define a requirements.txt file, and provide instructions
from sklearn import datasets
from sklearn.linear_model  import SGDClassifier
from sklearn.model_selection import cross_val_score
import numpy


# Here, we're using the standard Iris flower dataset:
# https://en.wikipedia.org/wiki/Iris_flower_data_set
def load_data():
  iris = datasets.load_iris()
  return (iris.data, iris.target)


# SigOpt Orchestrate handles the interaction with the SigOpt API.
# Each time this file is executed on the cluster, SigOpt Orchestrate
# will automatically create a Suggestion and populate new
# hyperparameter assignments.
def evaluate_model(X, y):
  # SigOpt Training Runs reads new assignments for all of the parameters
  # that you define in your experiment configuration file. If you did not
  # define a parameter in your experiment configuration file, this function
  # will fall back to the provided default value.
  classifier = SGDClassifier(
    loss=sigopt.get_parameter('loss', default='log'),
    penalty=sigopt.get_parameter('penalty', default='elasticnet'),
    alpha=10**sigopt.get_parameter('log_alpha', -4),
    l1_ratio=sigopt.get_parameter('l1_ratio', 0.15),
    max_iter=sigopt.get_parameter('max_iter', default=1000),
    tol=sigopt.get_parameter('tol', default=0.001),
  )
  cv_accuracies = cross_val_score(classifier, X, y, cv=5)
  return (numpy.mean(cv_accuracies), numpy.std(cv_accuracies))


# Each execution of model.py should represent one evaluation of your model.
# When this file is run, it loads data, evaluates the model using assignments
if __name__ == "__main__":
  (X, y) = load_data()
  (mean, std) = evaluate_model(X=X, y=y)
  print('Accuracy: {} +/- {}'.format(mean, std))
  sigopt.log
  sigopt.log_metric('accuracy', mean, std)

Notes on implementing your model Back to Top

When your model runs on a node in the cluster it can use all of the CPUs on that node with multithreading. This is good for performance if your model is the only process running on the node, but in many cases it will need to share those CPUs with other processes (ex. other model runs). For this reason it is a good idea to limit the number of threads that your model library can create in conjunction with the amount of cpu specified in your resources_per_model. This varies by implementation, but some common libraries are listed below:

Numpy

Threads spawned by Numpy can be configured with environment variables, which can be set in your Dockerfile:

ENV MKL_NUM_THREADS=N
ENV NUMEXPR_NUM_THREADS=N
ENV OMP_NUM_THREADS=N

Tensorflow/Keras

Can be configured in the Tensorflow module, see: https://www.tensorflow.org/api_docs/python/tf/config/threading

PyTorch

Can be configured in the PyTorch module, see: https://pytorch.org/docs/stable/generated/torch.set_num_threads.html

Create a SigOpt Orchestrate Configuration File Back to Top

Here's a sample SigOpt Orchestrate configuration file that specifies an HPO experiment for the model.py specified above on one CPU.

Create a file named orchestrate-hpo.yml and copy and paste the following into orchestrate-hpo.yml.

resources_per_model:
  requests:
    cpu: 0.5
    memory: 512Mi
  limits:
    cpu: 1
    memory: 512Mi
# We don't need any GPUs for this example, so we'll leave this commented out
#  gpus: 1


# Choose a descriptive name for your model
name: Orchestrate SGD Classifier (python)

# Here, we run the model
run: python model.py

# Here, we define optimization details
optimization:
  # Every experiment needs at least one named metric.
  metrics:
    - name: accuracy

  # Parameter that are defined here are available to SigOpt Orchestrate
  parameters:
    - name: l1_ratio
      type: double
      bounds:
        min: 0
        max: 1.0
    - name: log_alpha
      type: double
      bounds:
        min: -5
        max: 2

  # Our example cluster has two machines, so we have enough compute power
  # to execute two models in parallel.
  parallel_bandwidth: 2

  # We want to evaluate our model on sixty different sets of hyperparameters
  observation_budget: 60

# SigOpt Orchestrate creates a container for your model. Since we're using an AWS
# cluster, it's easy to securely store the model in the Amazon Elastic Container Registry.
# Choose a descriptive and unique name for each new experiment configuration file.
image: orchestrate/sgd-classifier

Execute Back to Top

So far, SigOpt Orchestrate is connected to your cluster, the Dockerfile defines your model requirements, and you've updated the SigOpt Orchestrate configuration file. SigOpt Orchestrate can now execute an HPO experiment on your cluster.

orchestrate optimize -f orchestrate-hpo.yml

If you'd like to test your HPO experiment locally before executing on your cluster, you can run the following:

orchestrate optimize-local -f orchestrate-hpo.yml

Monitor Back to Top

You can monitor the status of SigOpt Orchestrate Experiments from the command line using the run name or the Experiment ID.

orchestrate status experiment/99999
experiment/999999:
	Experiment Name: hoco
	5.0 / 64.0 Observation budget
	5 Observation(s) failed
	Run Name            	Pod phase      	Status         	Link
	run-25ne1woa        	Succeeded      	failed         	https://app.sigopt.com/run/49950
	run-2lkc1ppa        	Succeeded      	failed         	https://app.sigopt.com/run/49975
	run-zggujklx        	Succeeded      	failed         	https://app.sigopt.com/run/49980
	run-zhc9c5q0        	Succeeded      	failed         	https://app.sigopt.com/run/49967
	run-zydkuibj        	Succeeded      	failed         	https://app.sigopt.com/run/49966
	Follow logs: orchestrate kubectl logs -ltype=run,experiment=374876 --max-log-requests=4 -f
	View more at: https://app.staging.sigopt.com/experiment/999999

The status will include a command that you can run in your terminal to follow the logs as they are generated by your code.

Monitor progress in the web app Back to Top

You can monitor experiment progress on https://app.staging.sigopt.com/experiment/[id].

The History tab, https://app.staging.sigopt.com/experiment/[id]/history, shows a complete table of training runs created in the experiment. The State column displays the current state of each training run.

Stop Back to Top

You can stop your HPO Experiment at any point while it's running. This command stops and deletes an HPO Experiment on the cluster. All in-progress Training Runs will be terminated.

orchestrate stop --experiment [experiment-id]