"pipe" CLI implementation

This approach does not use the HTTP/REST API directly. Instead, a command-line wrapper is used.

pipe CLI is a command line utility, distributed together with the Cloud Pipeline. It offers a variety of the commands for an easy automation of the common tasks. More details on the pipe command line interface are available in the corresponding docs section

This utility can be used for any backend Cloud Provider, which is enabled for the CLoud Pipeline deployment (e.g. the command are all the same if using AWS/GCP/Azure).

The listed below cellranger_pipe.sh script, implements the "Usage scenario" using the only "pipe" command to communicate with the cloud platform.

#!/bin/bash

#############################################################
# Example dataset:
#   https://cloud-pipeline-oss-builds.s3.amazonaws.com/tools/cellranger/data/tiny-fastq/tiny-fastq.tgz
# Example run command:
#   cellranger.sh --fastqs ~/tiny-fastq \
#                 --transcriptome tiny \
#                 --workdir s3://my_bucket/tiny-example \
#                 --copy-back
#############################################################

#############################################################
# Parse options
#############################################################
POSITIONAL=()
while [[ $# -gt 0 ]]; do
key="$1"
    case $key in
        -f|--fastqs)
        FASTQS="$2"
        shift
        shift
        ;;
        -t|--transcriptome)
        TRANSCRIPTOME="$2"
        shift
        shift
        ;;
        -w|--workdir)
        WORKDIR="$2"
        shift
        shift
        ;;
        -i|--instance)
        INSTANCE_TYPE="$2"
        shift
        shift
        ;;
        -d|--disk)
        INSTANCE_DISK="$2"
        shift
        shift
        ;;
        -c|--copy-back)
        COPY_BACK=1
        shift
    esac
done

#############################################################
# Check prerequisites
#############################################################
if ! command -v pipe > /dev/null 2>&1; then
    cat << EOF
[ERROR] `pipe` Command Line Interface is not available.
Please follow the installation and configuration instructions, available in the Cloud Pipeline GUI:
* Login to the GUI
* Open "Settings" (from the left panel)
* Click "Get access key"
* Follow the installation instructions
EOF
    exit 1
fi

#############################################################
# Validate options
#############################################################

if [ -z "$FASTQS" ] || [ ! -d "$FASTQS" ]; then
    echo "[ERROR] Path to the fastq files is not set or is not a directory"
    exit 1
fi

if [ "$TRANSCRIPTOME" ]; then
    case $TRANSCRIPTOME in
        human)
        TRANSCRIPTOME_S3="s3://genome-bucket/human/transcriptome"
        ;;
        mouse)
        TRANSCRIPTOME_S3="s3://genome-bucket/mouse/transcriptome"
        ;;
        human-mouse)
        TRANSCRIPTOME_S3="s3://genome-bucket/human-mouse/transcriptome"
        ;;
        tiny)
        TRANSCRIPTOME_S3="s3://genome-bucket/tiny/transcriptome"
        ;;
        *)
        echo "[ERROR] Transcriptome name does not match the supported types: human, mouse, human-mouse, tiny"
        exit 1
        ;;
    esac
else
    echo "[ERROR] Transcriptome name is not set"
    exit 1
fi

if [ -z "$WORKDIR" ] || [[ "$WORKDIR" != "s3://"* ]]; then
    echo "[ERROR] S3 working directory is not set or uses an unexpected schema (s3:// shall be used)"
    exit 1
else
    WORKDIR_EXISTS=$(pipe storage ls $WORKDIR)
    if [ "$WORKDIR_EXISTS" ]; then
        echo "[ERROR] S3 working directory ($WORKDIR) already exists, please specify a new location"
        exit 1
    fi
fi

EXTRA_OPTIONS=()
if [ "$INSTANCE_TYPE" ]; then
    EXTRA_OPTIONS+=("--instance-type $INSTANCE_TYPE")
fi
if [ "$INSTANCE_TYPE" ]; then
    EXTRA_OPTIONS+=("--instance-disk $INSTANCE_DISK")
fi

#############################################################
# Transfer the local fastq files to the S3 working directory
#############################################################
FASTQS_S3="$WORKDIR/fastq/$(basename $FASTQS)"
echo "Transferring fastqs to the S3 working directory: $FASTQS -> $FASTQS_S3/"
pipe storage cp "$FASTQS" "$FASTQS_S3/" --recursive
if [ $? -ne 0 ]; then
    echo "[ERROR] Cannot upload $FASTQS to $WORKDIR"
    exit 1
fi

#############################################################
# Setup the paths and run options
#############################################################
RESULTS_S3="$WORKDIR/results"
DOCKER_IMAGE="ngs/cellranger:latest"

#############################################################
# Launch data processing
#############################################################
echo "Launch job with parameters:"
echo "fastqs:        $FASTQS_S3"
echo "transcriptome: $TRANSCRIPTOME_S3"
echo "results:       $RESULTS_S3"

pipe run --docker-image "$DOCKER_IMAGE" \
            --fastqs "input?$FASTQS_S3" \
            --transcriptome "input?$TRANSCRIPTOME_S3" \
            --results "output?$RESULTS_S3" \
            --cmd-template 'cellranger count --id cloud-cellranger --fastqs $fastqs --transcriptome $transcriptome' \
            --yes \
            --sync ${EXTRA_OPTIONS[@]}

if [ $? -ne 0 ]; then
    echo "[ERROR] Failed to process the dataset"
    exit 1
fi

echo "[OK] Job has finished"

#############################################################
# Copy the results back, if requested
#############################################################
if [ "$COPY_BACK" ]; then
    RESULTS_LOCAL=$(pwd)/$(basename $RESULTS_S3)
    echo "Transferring results locally: $RESULTS_S3 -> $RESULTS_LOCAL"
    pipe storage cp $RESULTS_S3 $RESULTS_LOCAL --recursive
    if [ $? -ne 0 ]; then
        echo "[ERROR] Cannot download $RESULTS_S3 to $RESULTS_LOCAL"
    fi
    echo "[OK] Data processing results are downloaded to $RESULTS_LOCAL"
else
    echo "[OK] Data processing results are available in the S3 working directory: $RESULTS_S3"
fi

To launch the data processing, the script above can be launched using the following command:

# Get the "tiny" dataset (or use your own data)
cd ~
wget "https://cloud-pipeline-oss-builds.s3.amazonaws.com/tools/cellranger/data/tiny-fastq/tiny-fastq.tgz"
tar -zxvf tiny-fastq.tgz


# Run the data processing
# It is assumed that cellranger.sh is stored in ~/cellranger.sh
chmod +x ~/cellranger.sh
./cellranger.sh --fastqs ~/tiny-fastq \
                --transcriptome tiny \
                --workdir s3://my_bucket/tiny-example \
                --copy-back

Transferring fastqs to the S3 working directory: ~/tiny-fastq -> s3://my_bucket/tiny-example/fastq/tiny-fastq/
Launch job with parameters:
fastqs:        s3://my_bucket/tiny-example/fastq/tiny-fastq
transcriptome: s3://genome-bucket/tiny/transcriptome
results:       s3://my_bucket/tiny-example/results
Pipeline run scheduled with RunId: 5865
Pipeline run 5865 completed with status SUCCESS

Transferring results locally: s3://my_bucket/tiny-example/results -> ~/results
[OK] Data processing results are downloaded to ~/results