2. Getting started

Thesaurus

Cloud Provider - supported cloud computing provider.

Cloud Region - specific geographical location where compute cloud resources could be hosted. Each Cloud Provider has its own Cloud Regions.

Objects - Cloud Pipeline (CP) entities such as Folder, Pipeline, Data Storage, Run configuration, Cluster node, Docker Registry, Tool group, Tool.

Folder, or directory - CP object. It is an entity similar to the directories in the file system. Folders are used to structure other CP objects.

Project - a special type of Folder. It might be used to organize data and metadata (see below) and simplify analysis runs for a large data set.

Pipeline - CP object. Represents a workflow script with versioned source code, documentation, and configuration. Under the hood, it is a git repository.

Data Storage - CP object. It is a cloud storage (e.g. S3 bucket for AWS or Blob Storage for MS Azure) representation in a folder hierarchy.

FS Mount - a data storage based on network FS (File System). It is a distributed file system that can be used by several nodes during High-performance computing jobs.

Docker Registry - CP object. It is a Docker registry representation in the CP. Docker registry stores docker images. For more details refer to https://docs.docker.com/registry/.

Tool - CP object. It is a Docker image representation in CP.

Tool group - CP object. It allows organizing different Tools into groups.

Detached configuration - Run configuration that is not attached to any particular pipeline.

Run configuration - CP object specifying parameters of the run: what pipeline or tool, type of instance to run with what parameters. There are Run Configurations for Detached configuration and for Pipeline configuration.

History - CP object. It shows all runs that are related to a project.

Cluster configuration - CP object. It is Run Configuration of Detached Configuration. Represents a set of pipelines or tools that run as one cluster. Also is used to associate pipeline parameters with some Metadata Entity.

Metadata - CP object that defines custom data entities associated with raw data files (fastq, bcl, etc.) or data parameters.

Batch job - an automated job that doesn't require interaction with a human.

Interactive application - type of an application that requires an interaction with a human to achieve certain results.

Docker image - a stand-alone, executable package that includes everything needed to run a piece of software. Containerized software will always run the same, regardless of the environment.

Docker container - a runtime instance of docker image - what the image turns into when actually executed.

Run - Executed "Pipeline" or "Tool" object, contains log information and parameters of an execution process.

Instance - virtual computing environment (e.g. EC2 instance for AWS or Azure Virtual Machines for MS Azure).

  • On-Demand Instances – "Stable" type of instances that can't be overbought while in use.

  • Spot Instances – instances, which has significantly lower costs. This type of instances are rented and can be overbought and shut down by cloud provider. Instances may have any type (number of CPU cores, processor type, amount of RAM and disk space, etc.) that describes optimization and available features for the instance. Each Cloud Provider calls these instances differently: spot instances for AWS, low priority instances for Azure, preemptible instances for GCP.

Cluster node (Calculation node) - instance used for a "Run" object.

Task - a discrete Pipeline step.

Attributes - data that provides information about other data. CP allows creating a custom metadata set of "key=values" (attributes) for its objects. It helps to maintain traceability and use this information for search queries.

Cloud Pipeline CLI - command line interface to the CP. It allows interaction with CP via command line instead of GUI.

STS (Short-Term Storage) - data storage that is used for frequently accessed files.

LTS (Long-Term Storage) - data storage that is used for infrequently accessed files. Access to data becomes more complicated but it's cheaper.

Pipeline template - it is a template structure for creating a pipeline. Typically it includes source code folder structure with the executable script, doc file, and configuration file.

Token - authentication key to getting access to an application.

Endpoint - link to access a launched application.

Pipeline version - A particular version of pipeline source code, documentation and configuration.

Log - automatically produced and time-stamped documentation of events relevant to a particular system.

Cluster - a collection of instances which are connected so that they can be used together on a task.

Access Control List - a list of pairs of attributes - user id and permissions: read/write/execute.

Supported Cloud Providers

The following Cloud Providers are supported at the moment:

  • Amazon Web Services‎ (AWS)
  • Microsoft Azure services (MS Azure)
  • Google Cloud Platform (GCP)

Supported Browsers

The following web-browsers are supported at the moment:

  • Google Chrome (best option)
  • Firefox
  • Microsoft EDGE
  • Internet Explorer 11

Authentication

SAML Authentication protocol is currently used in a Cloud Pipeline.