6.5. Pipeline configuration details

config.json

This file contains description of all pipeline's configs. Each config defines pipeline's execution parameters and settings that shall be used for the pipeline run.

Note: we advise to modify pipeline execution settings via CONFIGURATION tab for default pipeline configuration or via Launch pipeline page for pipeline settings for a current run.
Manual config.json editing is more suitable for advanced users (primarily developers).

This file has JSON format and for an empty newly pipeline looks like the following:

[{
  "name" : "default",
  "default" : true,
  "description" : "Initial default configuration",
  "configuration" : {
    "main_file" : "pipelineexample.sh",
    "instance_size" : "m5.large",
    "instance_disk" : "20",
    "docker_image" : "library/centos:7",
    "cmd_template" : "chmod +x $SCRIPTS_DIR/src/[main_file] && $SCRIPTS_DIR/src/[main_file]",
    "parameters": {}
  }
}]

Possible pipeline settings and attributes that could be specified via that configuration file:

Setting name Setting name Description Value Analog in GUI (CONFIGURATION tab)
name Name of the pipeline config String
Example: default
CP_PipelineConfiguration
default Whether the config is default for the current pipeline Boolean Default config is being opened and used for pipeline runs by default
description Description the pipeline config String
configuration Container for the current config settings
main_file Pipeline's main script file name String
Example: pipelineexample.sh
instance_size Instance type in terms of the specific Cloud Provider String
Example: m5.large
CP_PipelineConfiguration
instance_disk Instance's disk size in Gb Float
Example: 25
CP_PipelineConfiguration
docker_image Name of the Docker image that will be used for the pipeline run.
Name can be specified without Docker Registry path, in short format:
<Tools_group>/<Tool_name>:<Version>
String
Example: library/centos:7
CP_PipelineConfiguration
cmd_template Command template that will be executed in the pipeline run after all initializations.

Template can use environment variables:
- to address the main_file, use [main_file]
- to address other variables, use $<VARIABLE_NAME>, e.g. $SCRIPTS_DIR
String CP_PipelineConfiguration
parameters Container for the pipeline execution parameters.
See details below

Parameters section

Parameters section - it is parameters container inside configuration container of the pipeline config in config.json file.
This section defines execution parameters that will be used during a pipeline run.
Via these parameters, path for input or output data, sample name or some custom flag, etc. can be specified that will be used during the pipeline execution.

Example of that section:

...
"parameters" : {
    "InputData" : {
        "value" : "s3://testexamplestorage/input",
        "type" : "input",
        "section" : "other",
        "required" : false,
        "no_override" : false
    },
    "FastqAvailabilityFlag" : {
        "pretty_name" : "FASTQ availability",
        "value" : "true",
        "type" : "boolean",
        "section" : "other",
        "required" : false
    }
}
...

Each parameter has a name and a set of attributes/settings that affect parameter view and behavior in GUI before the run and how parameter shall be resolved during the run.
In config.json file, each parameter is described as container with the name corresponds to the parameter name, and all parameter attributes inside that container, i.e.:

"<PARAMETER_NAME>" : {
    "value" : "<PARAMETER_VALUE>",
    "type" : "<PARAMETER_TYPE>",
    ...
}
...

Possible parameter settings and attributes that could be specified via that configuration file:

Attribute name Description Value View in GUI (CONFIGURATION tab / Launch form)
value Parameter value.

Attribute may be omitted, then the corresponding field in the GUI will be left empty for manual input
String For example, "value" : "SR133451" will be shown as:
CP_PipelineConfiguration

If value attribute was not specified in the config.json,
such parameter will be shown with empty field:
CP_PipelineConfiguration
type Parameter type One of these:
For parameters with generic scalar values string CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For parameters with multiple predefined scalar values.
In GUI, such parameter field is displayed as dropdown list, where one value can be selected.

To specify all numeration values, additional attribute shall be used in format:
"enum" : ["<ENUM_VALUE1>", "<ENUM_VALUE2>", ...], where <ENUM_VALUE> - enumeration value.

Note: for additional features of enum type see below.
enum CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For parameters with boolean values boolean CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For specifying a path in a data storage hierarchy path CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For specifying an input path in a data storage hierarchy.
During the pipeline initialization, this path will be used to download input data to the node for processing
input CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For specifying an output path in a data storage hierarchy.
During the pipeline finalization, this path will be used to upload resulting data to the storage
output CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For specifying a common path in a data storage hierarchy.
Similar to input type, but this data will not be erased from a calculation node,
when a pipeline is finished. This is useful for data that can be reused by further pipeline runs
common CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
For loading parameter value from metadata entity.

Note: for additional details about value format of metadata type values see below.
metadata CP_PipelineConfiguration

In the GUI:
CP_PipelineConfiguration
description Parameter description.
It is not processed anyhow, but just shown as parameter help
String CP_PipelineConfiguration

If parameter has a description, it is displayed under parameter field:
CP_PipelineConfiguration
required Defines whether this parameter must be set or not.
If true, parameter value can't be left empty or removed, and must be specified before the launch.
If false, parameter value can be left empty or removed
Boolean CP_PipelineConfiguration
visible Defines whether this parameter is visible in the CONFIGURATION tab / Launch form or not.
If true, parameter will be visible as usual.
If false, parameter will not be visible in the CONFIGURATION tab / Launch form,
but parameter will be normally used during the run and visible on the Run logs page.

Also, value of visible attribute can be an expression over some parameter value that results to true or false.
Expression format shall be:
- <PARAMETER_NAME1> <OPERAND> <PARAMETER_NAME2> or several such sequences linked by <OPERAND>
- or <PARAMETER_NAME1> <OPERAND> <COMPARED_VALUE> or several such sequences linked by <OPERAND>
Where:
- <PARAMETER_NAME> - parameter which value used for expression calculation
- <COMPARED_VALUE> - strings in quotas, numbers
- <OPERAND> - logical operands: && (AND), || (OR), == (equals), != (not equals)
or condition operands: >, <, >=, <=
String Such parameter will not be visible in the CONFIGURATION tab / Launch form:
CP_PipelineConfiguration

Such parameter will be visible only if other parameter (SampleNameExtra)
will have a specific value (SR133458):
CP_PipelineConfiguration
no_override Defines whether parameter's default value from config will be immutable
in the detached configuration that uses this pipeline.
If true and parameter has a value - that value will be read-only.
If false and parameter has a value - that value will be rewritable.
If parameter has no default value, this option is ignored
Boolean In the detached configuration, that uses a pipeline:
CP_PipelineConfiguration
pretty_name Defines a pretty name (alias) for the parameter,
that will be shown near the parameter field in the CONFIGURATION tab / Launch form.

Note: during the pipeline script execution, original parameter name will be used
String CP_PipelineConfiguration

If attribute is set, pretty name is shown near the parameter:
CP_PipelineConfiguration
Original name can be viewed:
- in a tooltip on the CONFIGURATION tab / Launch form
- on the Run logs page during the run:
CP_PipelineConfiguration
section Allows to group different parameters in sections in the CONFIGURATION tab / Launch form.
Parameters with the same section will be located in one area with the corresponding header.

Note: for parameters created via the GUI, this attribute is set as "section" : "other"
String For example:
- for first 2 parameters "section" : "Samples" is set
- for another 2 parameters "section" : "Paths" is set
- for others "section" : "Other" is set
CP_PipelineConfiguration

You can use corresponding controls (hyperlinks) in the left upper corner
of the parameters section to navigate/focus on the specific section.
validation Allows to configure parameters validation according to values specified via the GUI.
Attribute presents an array of containers, each with structure {"throw", "message"}, where:
- throw - expression to test based on parameter(s) value(s)
- message - message that will be shown if throw expression is true

Format of throw expression shall be:
- <PARAMETER_NAME1> <OPERAND> <PARAMETER_NAME2> or several such sequences linked by <OPERAND>
- or <PARAMETER_NAME1> <OPERAND> <COMPARED_VALUE> or several such sequences linked by <OPERAND>
- or /<REGEX>/.test(<PARAMETER_NAME>)
Where:
- <PARAMETER_NAME> - parameter which value used for expression calculation
- <REGEX> - regular expression
- <COMPARED_VALUE> - strings in quotas, numbers
- <OPERAND> - logical operands: && (AND), || (OR), == (equals), != (not equals)
or condition operands: >, <, >=, <=
Array Example:
If the value of the following parameter equals to the value of the other one,
message will be thrown:
CP_PipelineConfiguration
And in in the CONFIGURATION tab / Launch form:
CP_PipelineConfiguration

Enum type features

For enumeration values, visibility can be configured for each value separately.
It is being configured via the attribute visible assigned to specific enumeration values.
In such case, format of the enumeration shall be:

"enum" : [
    {
        "name" : "<ENUM_VALUE1>",
        "visible" : "<VISIBILITY_EXPRESSION1>"
    },
    {
        "name" : "<ENUM_VALUE2>",
        "visible" : "<VISIBILITY_EXPRESSION2>"
    },
    ...
]

Where:

  • <ENUM_VALUE> - enumeration value
  • <VISIBILITY_EXPRESSION> - expression that determines if the visibility of that enumeration value:
    • if equals true, enumeration value will be visible in the parameter's dropdown list (in the CONFIGURATION tab / Launch form)
    • if equals false, parameter will not be visible
    • also, value can be an expression over some parameter(s) value(s) that results to true or false.
      In such case, expression format shall be:
      • <PARAMETER_NAME1> <OPERAND> <PARAMETER_NAME2> or several such sequences linked by <OPERAND>
      • or <PARAMETER_NAME1> <OPERAND> <COMPARED_VALUE> or several such sequences linked by <OPERAND>
        Where:
      • <PARAMETER_NAME> - parameter which value used for expression calculation
      • <COMPARED_VALUE> - strings in quotas, numbers
      • <OPERAND> - logical operands: && (AND), || (OR), == (equals), != (not equals) or condition operands: >, <, >=, <=

Example of visibility usage for separate enumeration values:

...
parameters: {
    "p1": {
        "type" : "enum",
        "enum" : ["human", "mouse"],
        "required" : true
    },
    "p2": {
        "type" : "enum",
        "enum": [
            {
                "name" : "hg19",
                "visible" : "p1 == 'human'"
            },
            {
                "name" : "hg38",
                "visible" : "p1 == 'human'"
            }
            ...
        ],
    }
    ...
}
...

In the example above, values hg19 and hg38 will be visible for parameter p2 only if parameter p1 is set to human.

Metadata type features

If the parameter has "type" : "metadata", then format of its value shall be: <METADATA_OBJECT_ID>:<INSTANCE_TYPE_NAME>:<INSTANCE_ID1>,<INSTANCE_ID2>,...
Where:

  • <METADATA_OBJECT_ID> - ID of the folder/project to which metadata belongs
  • <INSTANCE_TYPE_NAME> - name of the metadata instance type
  • <INSTANCE_ID> - ID of the metadata instance of the selected instance type

For example:
CP_PipelineConfiguration