Large-Scale MOO Experiments with SHARK – Oracle Grid Engine

This post explains how to conduct large-scale MOO experiments with the SHARK machine learning library on clusters running Oracle grid engine.

An experiment consists of three phases:

  1. front approximation
  2. performance indicator calculation
  3. result accumulation and statistics calculation

Within this post, I’m going to focus on the first step.

Front Approximation

In this phases, the Pareto front approximations generated by applying multiple multi-objective evolutionary algorithms (MOEAs) to a set of objective functions are recorded.

Here, I assume that we want to evaluate the (µ+1)-MO-CMA-ES relying on the hypervolume indicator on the DTLZ suite of benchmark functions. A ready-to-use command-line application implementing the MO-CMA-ES is bundled with the default Shark installation. The executable is configurable via command-line arguments queryable by passing –help:

  --objectiveFunction arg
  --seed arg (=1)
  --storageInterval arg (=100)
  --searchSpaceDimension arg (=10)
  --maxNoEvaluations arg (=50000)
  --timeLimit arg (=1000)
  --fitnessLimit arg (=1e-10)
  --resultDir arg (=.)
  --algorithmConfigFile arg
  --objectiveSpaceDimension arg (=2)

That is, to execute the MO-CMA-ES for DTLZ2 with 3 objectives and terminating after 50000 objective function evaluations, the following call is required:

  SteadyStateMOCMAMain --objectiveFunction=DTLZ2 --objectiveSpaceDimension=3 --maxNoEvaluations=50000 

Note that we do not specify the rng seed explicitly but rely on the default value 1.

For the scenario considered here, we want to run several independent trials of one specific MOEA and one specific objective function in parallel. To this end, we rely on the array job feature of the grid engine and submit an array of 25 independent trials to the grid engine with the following command:

  qsub -N 'DTLZ2_3' -t 1-25 DTLZ2 /globally/known/path 3

Here, the script is defined as follows:

#$ -S /bin/bash
#$ -o /dev/null

SteadyStateMOCMAMain --seed $SGE_TASK_ID --resultDir=$2 --objectiveFunction=$1 --objectiveSpaceDimension=$3

In summary, the script takes care of actually running the algorithm and setting the seed to environment variable $SGE_TASK_ID. The variable is set by the grid engine to the unique job number and thus, we can ensure independent trials. There is one more thing to note: The result dir needs to be known across the whole cluster. Normally, your dev ops provide you with a scratch environment that is accessible from every computing node.

That’s it. Wait a few minutes until the experiment completes and stay tuned for the second post that explains how to evaluate the quality of the Pareto-front approximations.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s