11. Manage Runs

Overview
ACTIVE RUNS
COMPLETED RUNS
Run information page
Automatically rerun if a spot instance is terminated

Overview

"Runs" provides a list of active and completed pipeline runs. You can get parameters and logs of specific run and stop run here.

"Runs" space has two tabs:

Active runs view
Completed runs view.

Runs are organized in a table which is the same for both tabs:

"State" icon - state of the run.
Run - include:
- run name (upper row) - pipeline name and run id
- Cloud Region (bottom row)

Note: if a specific platform deployment has a number of Cloud Providers registered (e.g. AWS+Azure, GCP+Azure) - corresponding text information also has a Provider name, e.g.:

Parent run - parent run ID, if a run was launched by another run.
Pipeline - include:
- pipeline name (upper row) - a name of a pipeline
- version name (bottom row) - a name of a pipeline version
Docker image - a name of docker image.
Started - time when a run was started.
Completed - time when a run was finished.
Elapsed - include:
- elapsed time (upper row) - a duration of a run
- estimated price (bottom row) - estimated price of run, which is calculated based on the run duration and selected instance type. This field is updated interactively (i.e. each 5 - 10 seconds).
Owner - a user, which launched a run.

Note: also you can view information about runs via CLI. See here.

ACTIVE RUNS

This tab displays a list of all pipelines that are currently running.
CP_ManageRuns

Active run states

- Queued state ("sandglass" icon) - a run is waiting in the queue for the available compute node.
- Initializing state ("rotating" icon) - a run is being initialized, at this stage a new compute node will be created or an existing node will be reused.
- Pulling state ("download" icon) - now pipeline Docker image is downloaded to the node.
- Running state (stable "play" icon) - a pipeline is running. The node is appearing and pipeline input data is being downloaded to the node before the "InitializeEnvironment" service task appears.
Pausing state (blinking "pause" icon) - a run is being paused. At this moment compute node will be stopped (but persisted) and the docker image state will be kept as well.
- Paused state (stable "pause" icon) - a run is paused. At this moment compute node is already stopped but keeps it's state. Such run may be resumed.
Resuming state (blinking "play" icon) - a paused run is being resumed. At this moment compute node is starting back from the stopped state.

CP_ManageRuns

Also, help tooltips are provided when hovering a run state icon, e.g.:

Tooltips contain a state name in bold (e.g. Queued) and a short description of the state and info on the next stage.

Active run controls

Control	Description
PAUSE/RESUME	Pauses/resumes a run. Available for On-demand non-cluster instances only. Learn more about feature here.
TERMINATE	Terminates compute node of a paused run without its resuming. Available for On-demand non-cluster instances only. Learn more about feature here.
STOP	This control stops a run execution.
LOG	To open a Run information page, press LOG button.

Active cluster runs

Cluster is a collection of instances which are connected so that they can be used together on a task.
If launched run uses a cluster or an auto-scaled cluster (see sections here), it has a certain designation:
CP_ManageRuns

By default, only master-run is displaying at the table. To view nested runs (child-runs) click the Expand control in front of the muster-run ID:
CP_ManageRuns

So, you can view an information about each child-run and its state, also you can stop specific nested run without stopping a parent run. You can open Run logs page (see below) for any of the cluster runs by click it or LOG button next to run ID.

Note: you can't pause cluster runs even with On-demand price type.
Note: stopping a parent run will stop execution of all nested runs too.

For runs with the auto-scaled cluster not all of the child-runs appear in the list immediately after parent run was launched, "scale-up" runs will appear only of necessity.

Displaying additional node metrics

According to the run states and system-level settings, additional metrics (labels) could be displayed for active runs:

- Idle label - displays only for runs in Running state, for which node's CPU consumption level is below than a certain threshold for a certain period of time or longer.
- Pressure label - displays only for runs in Running state, for which node's Memory/Disk consumption level is higher than a certain threshold.

Values of these thresholds and time period are specified by admins via system-level settings (see here for more details and an example of using - here).

COMPLETED RUNS

This tab displays a list of all pipelines runs that are already finished.

CP_ManageRuns

Completed run states

- Success state ("OK" icon) - successful pipeline execution.
- Failed state ("caution" icon) - unsuccessful pipeline execution.
- Stopped state ("clock" icon) - a pipeline manually stopped.

Help tooltips are also provided when hovering a completed run state icon, e.g.:

Completed run controls

Control	Description
LINKS	This control show input/output links of the pipeline
RERUN	This control allow rerunning of a completed run. The Launch a pipeline page will be open.
LOG	To open a Run information page, press LOG button.

Completed cluster runs

If completed run used a cluster or an auto-scaled cluster (see sections here), it has a certain designation. Displaying of such runs on the COMPLETED RUNS tab is similar to the active cluster runs. You can view an information about each child-run and its state, also you can rerun specific nested run without a parent run. You can open Run logs page (see below) for any of the cluster runs by click it or LOG button next to run ID:
CP_ManageRuns

Run information page

Click a row within a run list, "Run information" page will appear.
CP_ManageRuns

It consists of several sections:

General information

This section displays general information about a run:

Field	Description
State icon	state of the run. Help tooltips are provided when hovering a run state icon, e.g.:
Run ID	unique ID of the run.
Endpoint	(available only for tools runs) endpoint hyperlink for the service launched in an interactive tool. For more details see 15. Interactive services.
Share with	(available only for tools runs) list of users/groups with whom an interactive tool application is shared. For more details see 11.3 Sharing with other users or groups of users.
Owner	a name of the user who started pipeline.
Scheduled	time when a pipeline was launched.
Waiting for/Running for	time a pipeline has been running.
Started	time when the node is initialized and a pipeline has started execution.
Finished	time when a pipeline finished execution.
Estimated price	price of a run according to a run duration and selected instance type.
Nested runs	the child-runs list in cases when a run has a number of children (e.g. a cluster run or any other case with the parent-id specified)
Maintenance	the list of rules that define automatical pausing/restarting schedule for on-demand non-clusters runs

Nested runs

Nested runs list is displaying only for master runs. It is the list with short informations about cluster child-runs:
CP_ManageRuns

Each child-run record contains:

State icons with help tooltips when hovering over them
Pipeline name and version or docker image and version
Run time duration

Similar as a parent-run state, states for nested runs are automatically updated without page refreshing. To open any child-run logs page - click its name in the list.

If there are several nested runs, only the first couple are displayed at the parent-run logs page.
To view all nested runs - click the corresponding hyperlink:
CP_ManageRuns
The full list of nested runs for the selected parent-run will be opened, e.g.:

Cluster run usage

User can view the cluster usage at the parent-run logs page - near the Nested runs label, number of nested runs active at the moment is displayed:
CP_ManageRuns

If the cluster run was completed - here, the summary number of nested runs launched during the cluster run is displayed, e.g.:

Also, user can view how the cluster usage has been changing during the run of this cluster.
This is especially useful information for auto-scaled clusters, as the number of worker nodes in such clusters can vary greatly over time.

To view the cluster usage - click the corresponding hyperlink near the number of active nested runs:
CP_ManageRuns
The chart pop-up will be opened, e.g.:

The chart shows a cluster usage - number of all active instances (including the master node) of the current cluster over time.

To view details - hover over the chart point, e.g.:
CP_ManageRuns
In this case, summary info about the number of active cluster instances in a specific moment will be displayed.

To view which runs exactly were active in the cluster (including the master node) in a specific moment - click the chart point, e.g.:
CP_ManageRuns
You can click any run ID in such a tooltip - the corresponding run's logs page will be opened.

Please note, the cluster usage chart is available for completed cluster runs as well.

Maintenance

Here user can create/view/edit/remove schedule rules for the specific active launched run. That set rules allow to pause/resume the run automatically in the scheduled day and time.

The Maintenance control is available only for active "On-demand" non-cluster runs.

E.g.:
CP_ManageRuns

To edit an existing schedule for the active launched run click the Configure button.
The popup with created rules will appear:
CP_ManageRuns
You can, for example, remove an existing rule and add new ones:

Changes will be displayed at the Run logs page and will be applied to the active run:

In general, this behavior is configured identically to the Maintenance control at the Launch page - for more details see 6.2. Launch a pipeline (item 5).

Instance

The "Instance" section lists calculation node and execution environment details that were assigned to the run when it was launched.
Note: node IP is presented as a hyperlink. Clicking it will navigate to the node details, where technical information and resources utilization is available - for more details see here.
Note: Docker image name link leads to a specific Tool's detail page (see an example).
CP_ManageRuns

Note: if a specific platform deployment has a number of Cloud Providers registered (e.g. AWS+Azure, GCP+Azure) - corresponding auxiliary Cloud Provider icon is additionally displayed, e.g.:

Note: if specific run CPU/Memory/Disk consumption is lower or higher specified in the configurations values, the IDLE or PRESSURE labels will be displayed respectively:

Parameters

The parameters that were assigned to the run when it was launched are contained in this section.
Note: parameters with types input/output/common/path are presented as hyperlinks, and will navigate to appropriate location in a Data Storage hierarchy.
CP_ManageRuns

Note: if a user specifies system environment variables in parameter (e.g. RUN_ID), GUI will substitute these variables with their values automatically in the "Run information" page.
CP_ManageRuns

CP_ManageRuns

Tasks

Here you can find a list of tasks of pipeline that are being executed or already finished. Clicking a task and its console output will be loaded in the right panel.
CP_ManageRuns

Console output

Here you can view console output from a whole pipeline or a selected task. It also shows a run failure cause if a run failed.
CP_ManageRuns

Note: the Follow log control enables auto scrolling of the console output. It is useful for logs monitoring. Follow log is enabled by default, tick the box to turn it off.
CP_ManageRuns

Also, during a pipeline run an extended node-level logging is maintained:

kubelet logs (from all compute nodes) are written to the files
Log files are streamed to the storage, identified by the storage.system.storage.name preference

Users with the ROLE_ADMIN role can find the corresponding node logs (e.g. by the hostname or ip that are attached to the run information) in that storage by the path logs/nodes/{hostname}:

Open the Run logs page of the run you want to see kubelet logs
Select the InitializeNode task, find a node hostname in the console output:

Copy the found hostname's value.
Check the storage path specified at the storage.system.storage.name preference:

Open in the Library that storage.
Navigate in the opened storage to the path logs/nodes/:
Click the "breadcrumbs" control at the upper side of the page, enter / into the end of the path and after it paste the hostname value, copied at step 2:

Press the Enter key.
The folder with kubelet logs for the specified node will be opened:

You can open it and see the list of logs files, divided by the messages type:

You can view any of these files using Cloud Pipeline facilities or download them to your local machine:

Controls

Note: Completed and active runs have different controls.
Example: controls of running Spot pipeline:
CP_ManageRuns

Here's the list of all existing buttons

Control	Description
BROWSE	Allows to open the run instance filesystem in a Storage browser Web GUI - so the user can view, download, upload, delete files and directories for the current active run.
COMMIT	Allows modifying an existing tool that has been changed via ssh. See 10.4. Edit a Tool.
EXPORT LOGS	Allows to export logs.
GRAPH VIEW	For Luigi and WDL pipelines GRAPH VIEW is available along with a usual plain view of tasks. See 6.1.1. Building WDL pipeline with graphical PipelineBuilder.
LAUNCH COMMAND	Allows to generate the CLI `pipe run` command/API `POST` request for a job launch.
PAUSE	Allows to pause a run (only for On-demand* non-cluster runs*).
RERUN	Allows to rerun completed runs.
RESUME	Allows to resume a paused run (only for On-demand* non-cluster runs*).
SHOW TIMINGS/HIDE TIMINGS	Allows to show/hide duration of each task.
SSH	Allows to shh to the instance running "sleep infinity" mode. See 6.1. Create and configure pipeline.
STOP	Allows to stop a run.
TERMINATE	Allows to terminate compute node of a paused run without resuming (only for On-demand* non-cluster runs*).

Automatically rerun if a spot instance is terminated

In certain cases - Cloud Provider may terminate a node, that is used to run a job or an interactive tool. It may be in cases:

Spot prices changed
Cloud Provider experienced a hardware issue

These cases aren't a Cloud Platform bug. In these cases:

If a job fails due to server-related issue, special message is displayed, describing a reason for the hardware failure:
If a batch job fails due to server-related issue and Cloud Provider reports one of the following instance status codes:
- Server.SpotInstanceShutdown - a spot instance was stopped due to price changes,
- Server.SpotInstanceTermination - a spot instance was terminated due to price changes,
- Server.InternalError - Cloud Provider hardware issue,
  batch job will be restarted from scratch automatically.
Note: this behavior will occur, only if administrator applied and configured it (for more information see 12.10. Manage system-level settings).