MPI workflow runner —beta

The MPI workflow runner is a second way of executing workflows using a cluster:
  • high throughput mode (using DRMAA for example): 1 job = 1 cluster job
  • low throughput mode with the MPI workflow runner: 1 workflow = 1 cluster job
_images/sw_high_throughput.png _images/sw_low_throughput.png
This mode makes it possible:
  • to run workflows on clusters with limited number of job per user.
  • to run workflows on clusters where it is not possible to run a server process.
  • to run workflows if you only have a simple MPI installation.
  • to remove the time interval between jobs execution which is due to the submission of each job to the cluster management system.

This mode is for advenced users who know how to use a cluster. In this mode the workflow control (submission, stop and resart) via the python API is disabled. The DRMS commands such as qsub and qdel for PBS for example, must be used instead. However, the monitoring tools (GUI and Python API) remains the same.

Requirements:

  • Python version 2.7 or more
  • MPI4py

Installation:

This mode use pure Python code, you do not need to compile anything: c.f. Soma-workflow main page for installation.

Configuration:

Required items:

  • DATABASE_FILE
  • TRANSFERED_FILES_DIR
  • SCHEDULER_TYPE

Note

SCHEDULER_TYPE must be set to “mpi”

Example:

[Titan_MPI]

DATABASE_FILE        = path_mpi_runner_config/soma_workflow.db
TRANSFERED_FILES_DIR = path_mpi_runner_config/transfered_files
SCHEDULER_TYPE       = mpi

# optional logging
SERVER_LOG_FILE   = path/logs/log_server
SERVER_LOG_FORMAT = %(asctime)s => line %(lineno)s: %(message)s
SERVER_LOG_LEVEL  = ERROR
ENGINE_LOG_DIR    = path/logs/
ENGINE_LOG_FORMAT = %(asctime)s => line %(lineno)s: %(message)s
ENGINE_LOG_LEVEL  = ERROR

Optionally, to monitor the workflow execution from a remote machine, the following lines can be used to configure soma-workflow on that machine:

[Titan_MPI]
# remote access information
CLUSTER_ADDRESS     = titan.mylab.fr
SUBMITTING_MACHINES = titan0

# optional
LOGIN = my_login_on_titan

MPI_workflow_runner options:

$ python -m soma_workflow.MPI_workflow_runner --help
Usage: MPI_workflow_runner.py [options]

Options:
  -h, --help            show this help message and exit
  --workflow=WORKFLOW_FILE
                        The workflow to run.
  -r WF_ID_TO_RESTART, --restart=WF_ID_TO_RESTART
                        The workflow id to restart
  --nb_attempt_per_job=NB_ATTEMPT_PER_JOB
                        A job can be restarted several time if it fails. This
                        option specify the number of attempt per job. By
                        default, the jobs are not restarted.

Example using PBS:

Create a submission script for PBS (example run_workflow.sh):

#!/bin/bash
#PBS -N my_workflow
#PBS -j oe
#PBS -l walltime=10:00:00
#PBS -l nodes=3:ppn=8
#PBS -q long

time mpirun python -m soma_workflow.MPI_workflow_runner Titan_MPI --workflow $HOME/my_workflow_file

In the example, the workflow will run on 8 cores of 3 nodes (that is 8*3 cpus). It will be submitted to the “long” queue, and will last at most 10 hours.

Use the following command to submit the script to the cluster, and thus start the workflow execution:

$ qsub run_workflow.sh

You will get an cluster id for the MPI job you have submitted (112108 for example). Use the following command line to kill the MPI job and thus stop workflow execution:

$ qdel 112108

Each workflow has an id in Soma-workflow. This id is displayed in the GUI or can be recovered using the Python API (example 520). Use the following submission script to restart the workflow:

#!/bin/bash
#PBS -N my_workflow
#PBS -j oe
#PBS -l walltime=10:00:00
#PBS -l nodes=3:ppn=8
#PBS -q long

time mpirun python -m soma_workflow.MPI_workflow_runner Titan_MPI --restart 520

As before, use “qsub” to submit the workflow and possibly “qdel” to stop it.

Current limitations:

  • The workflow must contain jobs using only one CPU
  • It is safer to run only one MPI workflow runner at the same time
  • As in the following example, some workflows might result of a waste of CPU time.
_images/sw_low_throughput_wasted_cpus.png