XML 2.0 specification
=====================

Table of content
----------------

-  `Processes <#processes>`__
-  `Parameters types <#parameters-types>`__
-  `Process roles <#process-roles>`__
-  `Association between a Python function and an XML
   string <#association-between-a-python-function-and-an-xml-string>`__
-  `Processes examples <#processes-examples>`__
-  `Pipelines <#pipelines>`__
-  `The <doc> element <#the-doc-element>`__
-  `The <process> element <#the-process-element>`__
-  `The <switch> element <#the-switch-element>`__
-  `The <optional_output_switch> element <#the-optional-output-switch-element>`__
-  `The <link> element <#the-link-element>`__
-  `The <processes_selection> element <#the-processes-selection-element>`__
-  `The <pipeline_steps> element <#the-pipeline-steps-element>`__
-  `The <gui> element <#the-gui-element>`__
-  `Pipeline example <#pipeline-example>`__
-  `API <#api>`__
-  `XML validation <#xml-validation>`__

Processes
---------

The XML process specification makes it possible to use a standard Python
function and to associate it with an XML string that enables the
creation of a ``Process`` instance. This XML string will define the type
and behaviour of function parameters and return value(s).

In order to create a ``Process`` instance for a function it is necessary
to get some information about each parameter of the function and about
the return value. This information about parameters is defined in an XML
string with the exception of the **default values** of the parameters
that are extracted from the function definition.

The process XML string contains one single ``<process>`` element.
This element that may contain some global properties for the process.
``<process>`` may contain the following attributes:

-  *capsul\_xml* (optional): version of the Capsul XML specification
   this process definition is compatible with. If omitted, the process
   definition is supposed to be compatible with the latest Capsul XML
   specification available.
-  *role* (optional): A role that is attached to the process. See
   "Process roles" below.

In the ``<process>`` element, one can find one ``<input>`` element
per parameter of the function. If the process produces one or several
outputs, it must use a ``<return>`` element. If ``<return>`` is not
defined, the value returned by the Python function is ignored and cannot
be used in pipelines. For a single output, the Python function must
directly return the value and the value name (an output value must
always have a name), type and documentations must be in the element's
attributes (see below). Here is an example of a process defined as a
function returning a value:

.. code:: python

    from capsul.process.xml import xml_process

    @xml_process('''
    <process capsul_xml="2.0">
        <input name="a" type="int" doc="An integer"/>
        <input name="b" type="int" doc="Another integer"/>
        <return name="addition" type="int" doc="a + b"/>
    </process>
    ''')
    def add(a, b):
         return a + b

If the process needs to return several values, they must be declared with
``<output>`` elements located between ``<return>`` and
``</return>``. The function must return the output values either in a
list or in a dictionary. If it is a list the order of the ``<output>``
elements is used to match the values in the list and the process
parameter names. If it is a dictionary, each key must correspond to a
``name`` attribute in an ``<output>`` element. For instance:

.. code:: python

    from capsul.process.xml import xml_process

    @xml_process('''
    <process capsul_xml="2.0">
        <input name="a" type="int" doc="An integer"/>
        <input name="b" type="int" doc="Another integer"/>
        <return>
            <output name="quotient" type="int" doc="Quotient of a / b"/>
            <output name="remainder" type="int" doc="Remainder of a / b"/>
        </return>
    </process>
    ''')
    def divide(a, b):
         return {
            'quotient': int(a / b),
            'remainder': a % b,
        }
        # On a process point of view, it would be equivalent to
        # use the following code:
        # return [int(a / b), a % b]

``<input>``, ``<output>``, or ``<return>`` (for a single return with no
children elements), contain the following attributes:

-  *name*: the name of the function parameter
-  *type*: the type of the parameter. See possible parameter types
   below.
-  *allowed\_extensions*: for ``file`` type, list of possible file
   extensions.
-  *doc* (optional): the documentation of the parameter

-  ``<input>`` is straightforward: it is always an input parameter.
-  ``<output>`` is normally an output parameter, except in some cases
   when it is a file: an output file may have its filename specified as
   input (the filename is not generated by the process). In this case an
   additional attribute *input\_filename* specifies the parameter used
   to specify the filename. this parameter has the type ``File`` and is
   marked as output, but is actually an input to the processing
   function.
-  ``<return>`` is an output which is returned by the processing
   function. For a single ``<return>`` it is very similar to
   ``<output>`` but only one ``<return>`` element is allowed in a
   process. The process should return a single value.

Parameter types
~~~~~~~~~~~~~~~

For ``<input>``, ``<output>`` and ``<return>`` elements, the ``type``
attribute can have the following values:

-  **int**
-  **float**
-  **string**
-  **unicode**
-  **file**
-  **directory**
-  **enum** : when this type is used, there must be a ``values``
   attribute that contains a Python literal representing a list of
   possible values for the parameter.
-  **list_int**
-  **list_float**
-  **list_string**
-  **list_unicode**
-  **list_file**
-  **list_directory**

When a parameter accepts multiple types, they must be separated by a
``|``. For instance a parameter accepting either a file or a list of
files would use ``type="file|list_file"``.

Process roles
~~~~~~~~~~~~~

The role of a process gives information about the expected execution
context. It can be used to decide whether a process should be executed
in a given context or not. The role can also be used to propose a
specific GUI for the process. For instance the role ``"viewer"``
indicate that the execution of the process will display something to the
user. There is no need to execute such a process in a remote computer
that is disconnected from the user environment.

The possible process roles are :

-  ``viewer``: the process is used to display something to the user.
   It cannot be executed outside the user graphical environment. A
   viewer is not supposed to be blocking. It should terminate
   immediately an let the view live independently of the rest of the
   process. If blocking is required, use the ``dialog`` role.
-  ``dialog``: a dialog is used to show something to the user and
   wait for a user action before ending its execution. Like a
   ``viewer``, it cannot be executed outside the user graphical
   environment. The expected user action can be as simple as clicking on
   a single "ok" button ; in that case, the process should have no
   output. But it can be a complete form whose result must be returned
   via the process output parameter(s).

Association between a Python function and an XML string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are two ways to perform the association between the function and
the XML. The recommended method is to use a decorator to explicitly
define the XML string associated to the function. Here is an example :

.. code:: python

    from capsul.process.xml import xml_process

    @xml_process('''
    <process capsul_xml="2.0">
        <input name="input_image" type="file" desc="Path of a NIFTI-1 image file."/>
        <input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']" desc="Method for thresolding."/>
        <input name="threshold" type="float" desc="Threshold value."/>
        <output name="output_image" input_filename="output_location" type="file"
     desc="If set, define the output file name. Otherwise, the name is generated using a "threshold_" prefix on the input file name."/>
    </process>
    ''')
    def threshold(input_image, method='gt', threshold=0, output_location=None):
         pass

It is also possible to put the XML in the docstring of the function.
However, this method is not recommend and should be avoided if possible.
Example :

.. code:: python

    def threshold(input_image, method='gt', threshold=0, output_location=None):
        '''
        <process capsul_xml="2.0">
            <input name="input_image" type="file" desc="Path of a NIFTI-1 image file."/>
            <input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']" desc="Method for thresolding."/>
            <input name="threshold" type="float" desc="Threshold value."/>
            <output name="output_image" input_filename="output_location" type="file"
              desc="If set, define the output file name. Otherwise, the name is generated using a 'threshold_' prefix on the input file name."/>
        </process>
        '''
         pass

Processes examples
~~~~~~~~~~~~~~~~~~

.. code:: python

    from capsul.process.xml import xml_process

    @xml_process('''
    <process capsul_xml="2.0">
        <input name="input_image" type="file" doc="Path of a NIFTI-1 image file."/>
        <input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']"
         doc="Method for thresolding."/>
        <input name="threshold" type="float" doc="Threshold value."/>
        <output name="output_image" input_filename="output_image" type="file" doc="Output file name."/>
    </process>
    ''')
    def threshold(input_image, output_image, method='gt', threshold=0):
         pass

    @xml_process('''
    <process capsul_xml="2.0">
        <input name="input_image" type="file" doc="Path of a NIFTI-1 image file."/>
        <input name="mask" type="file" doc="Path of mask binary image."/>
        <output name="output_image" input_filename="output_location" type="file" doc="Output file name."/>
    </process>
    ''')
    def mask(input_image, mask, output_location=None):
         pass

Pipelines
---------

An XML pipeline is an XML document containing a single
``<pipeline>`` element that may contains some global properties for
the pipeline. Since a pipeline is also a process, the ``<pipeline>``
element may contain the same attributes as the ``<process>`` element
(see above).

An XML pipeline contains a series of processes that are defined by
``<process>`` elements. The input and outputs of processes are connected
by links that are defined in ``<link>`` elements. A pipeline may
allow a user to select one group of processes among a series of process
groups. The processes that are not selected are disabled (they will not
be executed) whereas the selected processes are enabled. The
``<processes_selection>`` element is used to define a set of
selectable process groups.

The ``<doc>`` element
~~~~~~~~~~~~~~~~~~~~~

This element has no attributes and contains the documentation of the
process in a `Sphinx <http://www.sphinx-doc.org>`__ compatible format.

The ``<process>`` element
~~~~~~~~~~~~~~~~~~~~~~~~~

A ``<process>`` element adds a new process instance to the pipeline.
This instance is given a **name** that can be used in other XML elements
to reference it. The process instance is referencing a **module** which
is the function that is called when the instance is run. The
``<process>`` element can have the following attributes:

-  *name*: a string that can be used to reference the process instance.
   This must be a valid Python variable name. It should use the variable
   naming convention of Python's PEP 8.
-  **module**: a valid Capsul process identifier. This is typically a
   fully qualified (e.g. containing the absolute Python module dotted
   path) Python object name. But any string value accepted by
   ``capsul.loadre.get_process_instance()`` can be used.
-  **role** (optional): set the role of the process instance (se
   "Process roles" above). If a role has been defined on the process
   module, it is ignored and replaced be the one declared in the
   pipeline. It is possible to use an empty string to force the process
   instance in the pipeline to have no role.
-  **iteration** (optional): when this attribute is used, the process
   instance will be an iteration process. The ``iteration`` attributes
   contains a coma separated lists of parameter names (for instance
   ``"input1,input2,output1"``). This list indicate the process
   parameter names on which the iteration will be performed. For each of
   these parameters, the actual type of the process instance parameter
   will be replaced by a list whose elements must have the process
   parameter type.
-  **enabled** (optional): used to explicitly mark a node as disabled
   (value: "false")

The ``<process>`` element can contain the following elements:

``<set>``
^^^^^^^^^

The ``<set>`` element is used to set a fixed value to a parameter. It
contains only two attributes:

-  **name**: the name of the parameter
-  **value**: The value of the parameter expressed as a Python literal.
   The use of a Python literal format enables the representation of
   structures values such as list. Some examples of values:
-  integer: ``<set name="x" value ="42"/>``
-  float: ``<set name="x" value ="4.2"/>``
-  string: ``<set name="x" value ="'a value'"/>``
-  None (i.e. JSON null): ``<set name="x" value ="None"/>``
-  list: ``<set name="x" value ="['one', 'two', 'three']"/>``

When a value is set on a parameter, it becomes an optional parameter.

``<nipype>``
^^^^^^^^^^^^

Capsul can use Nipype interfaces as process module. These interfaces
uses ``traits`` types that have some parameters that need to be set in
some contexts. The Nipype specific ``<nipype>`` element contains a
``name`` attribute to identify a process parameter. For more information
about these parameters, see `Nipype interface
specification <http://www.mit.edu/~satra/nipype-nightly/devel/interface_specs.html>`__
The following attributes can be used to customize Nipype ``traits`` :

-  **usedefault**: can be set to ``"true"`` or ``"false"``. Omitting the
   attribute is equivalent to ``"False"``.
-  **copyfile**: can be set to ``"true"`` or ``"false"``. Omitting the
   attribute is equivalent to ``"False"``. If the special value
   ``"discard"`` is used, the Nipype interface ``copyfile`` parameter
   will be set to ``True`` but the copied file will be deleted when the
   process terminates. This makes it possible to avoid some software
   (such as SPM) to modify input image but to keep only the original
   image at the end of the execution (the modified copy is deleted).

The ``<switch>`` element
~~~~~~~~~~~~~~~~~~~~~~~~

Represents switch nodes. May be replaced by process selection if it
proves to fulfill all the needs, but for now "old-style" switches still
exist, and are the only ones which can be saved.

Attributes:

-  **name**: node name in the pipeline (as in process elements)
-  **switch\_value** (optional): value of the "switch" parameter: name
   of the active input
-  **enabled** (optional): as in process elements

Children:

``<input>``
^^^^^^^^^^^

Input name for the switch. Input plugs will be a combination of
input/output names ``<input>_switch_<output>``

Attributes:

-  **name**
-  **optional** (optional) ``"true"`` or ``"false"``

``<output>``
^^^^^^^^^^^^

Output plug for the switch.

Attributes:

-  **name**
-  **optional** (optional)

The ``<optional_output_switch>`` element
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Represents a specific switch node which allows to have optional output
files in the pipeline parameters, while keeping them available for
temporary values inside the pipeline if they are left undefined.

Attributes:

-  **name**: node name in the pipeline (as in process elements)
-  **enabled** (optional): as in process elements

Children:

``<input>``
^^^^^^^^^^^

Input name for the switch. Input plugs will be a combination of
input/output names ``<input>_switch_<output>``. In an optional output
switch, only one input is allowed.

Attributes:

-  **name**
-  **optional** (optional) ``"true"`` or ``"false"``

``<output>``
^^^^^^^^^^^^

Output plug for the switch. Only one output is allowed.

Attributes:

-  **name**

The ``<link>`` element
~~~~~~~~~~~~~~~~~~~~~~

This element adds a ling between an input parameter of a process and an
output parameter of another pipeline. It can also be used to "export" a
process parameter. Exporting a process parameter means making it visible
in the parameters of the pipeline. Unlike, the default ``Pipeline``
behaviour in Capsul's API, a pipeline defined in Capsul XML 2.0 dot not
export automatically the unconnected parameters of its processes. The
``<link>`` element contains no child elements and mus have exactly two
attributes:

-  **source**: the parameter where the link starts from.
-  **dest**: the parameter where the link ends to.
-  **weak\_link** (optional): ``"true"`` or ``"false"``

The value of these attributes can be either a single identifier (e.g.
``"parameter_name"``) or two identifiers separated by a dot (e.g.
``"process_name.parameter_name"``). A single identifier correspond to a
pipeline parameter whereas two identifiers identify a process parameter,
they must correspond to the name of a process and the name of one
parameter of this process.

The ``<processes_selection>`` element
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``<processes_selection>`` element defines a series of processes
groups. Each processes group is composed by a series of processes added
in the pipeline with the ``<process>`` element. Only one of these
processes groups can be executed in the pipeline. Therefore, a new
parameter is added to the pipeline that allows the user to select the
group to execute. All processes in the selected group are activated
(*i.e.* will be executed) whereas all processes in other groups are
disabled (*i.e.* will not be executed).

The ``<processes_selection>`` has a single ``name`` attribute that
is the name of the parameter that is added to the pipeline. It must
contains two or more ``<processes_group>`` elements. Each
``<processes_group>`` contains one or more ``<process>`` element having
only a single ``name`` attribute. This attribute is the name of a
process defined in the pipeline (see `The ``<process>``
element <#the-process-element>`__ above).

The ``<pipeline_steps>`` element
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Children:

``<step>``
^^^^^^^^^^

Attributes:

-  **name**: name for the step
-  **enabled** (optional): ``"true"`` or ``"false"``

Children:

``<node>``
''''''''''

Attributes:

-  **name**: name of an existing pipeline node which will be part of
   this step.

The ``<gui>`` element
~~~~~~~~~~~~~~~~~~~~~

The ``<gui>`` element enables to define the position of nodes for a
graphical representation. The position of a node is given by a
``<position>`` element that contains three attributes :

-  **name**: The name of the process (as given in `the process
   element <#the-process-element>`__).
-  **x**: The x coordinate of the process.
-  **y**: The y coordinate of the process.

A single global zoom level can be given to the gui with a ``<zoom>``
element that contains a single ``level`` attributes whose value is a
floating point.

Pipeline example
~~~~~~~~~~~~~~~~

.. code:: xml

    <pipeline capsul_xml="2.0">
        <process name="threshold_gt_1"
         module="capsul.process.test.test_load_from_description.threshold">
            <set name="threshold" value="1"/>
            <set name="method" value="'gt'"/>
        </process>
        <process name="threshold_gt_10"
         module="capsul.process.test.test_load_from_description.threshold">
            <set name="threshold" value="10"/>
            <set name="method" value="'gt'"/>
        </process>
        <process name="threshold_gt_100"
         module="capsul.process.test.test_load_from_description.threshold">
            <set name="threshold" value="100"/>
            <set name="method" value="'gt'"/>
        </process>
        <process name="threshold_lt_1"
         module="capsul.process.test.test_load_from_description.threshold">
            <set name="threshold" value="1"/>
            <set name="method" value="'lt'"/>
        </process>
        <process name="threshold_lt_10"
         module="capsul.process.test.test_load_from_description.threshold">
            <set name="threshold" value="10"/>
            <set name="method" value="'lt'"/>
        </process>
        <process name="threshold_lt_100"
         module="capsul.process.test.test_load_from_description.threshold">
            <set name="threshold" value="100"/>
            <set name="method" value="'lt'"/>
        </process>
        <process name="mask_1"
         module="capsul.process.test.test_load_from_description.mask">
        </process>
        <process name="mask_10"
         module="capsul.process.test.test_load_from_description.mask">
        </process>
        <process name="mask_100"
         module="capsul.process.test.test_load_from_description.mask">
        </process>

        <link source="input_image" dest="threshold_gt_1.input_image"/>
        <link source="input_image" dest="threshold_gt_10.input_image"/>
        <link source="input_image" dest="threshold_gt_100.input_image"/>

        <link source="input_image" dest="threshold_lt_1.input_image"/>
        <link source="input_image" dest="threshold_lt_10.input_image"/>
        <link source="input_image" dest="threshold_lt_100.input_image"/>

        <link source="input_image" dest="mask_1.input_image"/>
        <link source="input_image" dest="mask_10.input_image"/>
        <link source="input_image" dest="mask_100.input_image"/>

        <link source="threshold_gt_1.output_image" dest="mask_1.mask"/>
        <link source="threshold_gt_10.output_image" dest="mask_10.mask"/>
        <link source="threshold_gt_100.output_image" dest="mask_100.mask"/>
        <link source="threshold_lt_1.output_image" dest="mask_1.mask"/>
        <link source="threshold_lt_10.output_image" dest="mask_10.mask"/>
        <link source="threshold_lt_100.output_image" dest="mask_100.mask"/>

        <link source="mask_1.output_image" dest="output_1"/>
        <link source="mask_10.output_image" dest="output_10"/>
        <link source="mask_100.output_image" dest="output_100"/>

        <processes_selection name="select_method">
            <processes_group name="greater than">
                <process name="threshold_gt_1"/>
                <process name="threshold_gt_10"/>
                <process name="threshold_gt_100"/>
            </processes_group>
            <processes_group name="lower than">
                <process name="threshold_lt_1"/>
                <process name="threshold_lt_10"/>
                <process name="threshold_lt_100"/>
            </processes_group>
        </processes_selection>

        <gui>
            <position name="threshold_gt_100" x="386.0" y="403.0"/>
            <position name="inputs" x="50.0" y="50.0"/>
            <position name="mask_1" x="815.0" y="153.0"/>
            <position name="threshold_gt_10" x="374.0" y="242.0"/>
            <position name="threshold_lt_100" x="556.0" y="314.0"/>
            <position name="threshold_gt_1" x="371.0" y="88.0"/>
            <position name="mask_10" x="820.0" y="293.0"/>
            <position name="mask_100" x="826.0" y="451.0"/>
            <position name="threshold_lt_1" x="570.0" y="6.0"/>
            <position name="threshold_lt_10" x="568.0" y="145.0"/>
            <zoom level="1.0"/>
        </gui>
    </pipeline>

API
---

Definition of processes and pipelines in Capsul XML 2.0 are recognised
by :func:`get_process_instance <capsul.api.get_process_instance>`. For an XML process, the identifier of
the process is ``<module>.<function>`` where ``<module>`` is the fully
qualified name of the Python module where the function is located and
``<function>`` is the name of the function as defined in the module. In
order to work with :func:`get_process_instance <capsul.api.get_process_instance>`, the module must be in the
Python path. For instance,
``capsul.process.test.test_load_from_description.threshold`` is the
identifier of the function ``threshold`` located in the module
``capsul.process.test.test_load_from_description``.

For an XML pipeline, :func:`get_process_instance <capsul.api.get_process_instance>` is looking for the XML
file defining the pipeline. The file name must ends with ``.xml`` and be
located in a directory associated to a valid Python package (i.e. a
module in a directory). The pipeline identifier is a string
``<module>.<name>`` where ``<module>`` is the fully qualified Python
module name and ``<name>`` is the file name without the ``.xml``
extension. For instance ``capsul.process.test.test_pipeline`` is the
identifier for the pipeline defined in
``<python_path>/capsul/process/test/test_pipeline.xml``.

One can find all the Processe and Pipeline identifiers defined in a
module (and recursively in all its sub-modules) with the function
``find_processes(module_name)`` (in ``capsul.process.finder``). For
instance, to try to instantiate all processes and pipelines defined in
the module ``clinfmri`` :

.. code:: python

    from capsul.api import get_process_instance, find_processes

    for p in find_processes('clinfmri'):
        try:
            get_process_instance(p)
        except Exception:
            print 'FAILED', p
        else:
            print 'GOOD', p

XML validation
~~~~~~~~~~~~~~

There is no validation of the XML document in :func:`get_process_instance <capsul.api.get_process_instance>`.
As a consequence, one will only get an error if the XML does not allow
to build a process or pipeline class (for instance if a mandatory
attribute is missing). On the other hand, misspelling of an element or
attribute name may not raise an error (the unknown item is simply
ignored). If there is a need for a validation feature for pipeline
development, it will be added in separate functions that would be built
to give precise errors and warnings to the user (including line number
in the XML file).
