Direct HTTP calls via curl
Cloud Pipeline's HTTP/REST API can be also consumed directly. From any language/framework.
Here we show to implement the Usage scenario using the curl
command to query the API.
All available HTTP/REST API methods are listed in the Cloud Pipeline Swagger UI (available as https://<host>/pipeline/restapi/swagger-ui.html
)
In this particular example script cellranger_curl.sh, which is listed below, the following API calls are performed:
GET /datastorage/find
to find the ID of the "WORKDIR" S3 bucketPOST /datastorage/tempCredentials/
to get the access token to the "WORKDIR" S3 bucket, which can passed to the AWS SDK for the data transferPOST /pipeline/run
to submit a job to the Cloud Pipeline compute environmentGET /pipeline/run
to get the running job status and wait for the completion
#!/bin/bash
#############################################################
# Setup the parameters
#############################################################
# Cloud Pipeline API entrypoint, e.g. https://<host>/pipeline
API_URL=""
# Shall be generated in the Cloud Pipeline GUI -> Settings -> CLI -> Generate access key
API_TOKEN=""
# Data location:
# S3 bucket, that is going to be used as a "working directory"
# Local fastq files and processing results will uploaded there
# Example value: s3://my_bucket/workdir
WORKDIR=""
# Path to a local directory, that holds the FASTQ files for processing
# E.g. ~/tiny-fastq
FASTQS=""
# Path to the cellranger transcriptome
# It shall be located in the S3 bucket already
# In this example we use the "tiny" reference, which is only 300Mb and allows to debug jobs quickly
TRANSCRIPTOME_S3="s3://genome-bucket/tiny/transcriptome"
# Path to the S3 location, that will hold the data processing results
RESULTS_S3="$WORKDIR/results"
# Job parameters:
# The docker image, which holds a cellranger binary and environment
DOCKER_IMAGE="single-cell/cellranger:latest"
# The size of the machine, that is going to process the data
# If it's not set - the default hardware for the $DOCKER_IMAGE is going to used
INSTANCE_TYPE="r5.xlarge"
# The size of disk volume in Gb, that will be attached to the machine defined by $INSTANCE_TYPE
# If it's not set - the default hardware for the $DOCKER_IMAGE is going to used
INSTANCE_DISK="100"
#############################################################
# Get the S3 working directory bucket ID
#############################################################
BUCKET_NAME=$(cut -d/ -f 3 <<< $WORKDIR)
RESPONSE=$(curl -ks -X GET -H "Authorization: Bearer $API_TOKEN" -H "Accept: application/json" "$API_URL/restapi/datastorage/find?id=$BUCKET_NAME")
BUCKET_ID=$(jq -r ".payload.id" <<< $RESPONSE)
#############################################################
# Get the S3 working directory bucket access token by it's ID
#############################################################
RESPONSE=$(curl -ks -X POST -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" -d "[
{
\"id\": $BUCKET_ID,
\"read\": true,
\"write\": true
}
]" "$API_URL/restapi/datastorage/tempCredentials/")
# The resulting keys, shall be used to configure the AWS SDK for the data transfer
# Here we use AWS CLI SDK, but this will work for any other (e.g. Java/JS/Go/Python/...)
export AWS_ACCESS_KEY_ID=$(jq -r '.payload.keyID' <<< $RESPONSE)
export AWS_SECRET_ACCESS_KEY=$(jq -r '.payload.accessKey' <<< $RESPONSE)
export AWS_SESSION_TOKEN=$(jq -r '.payload.token' <<< $RESPONSE)
#############################################################
# Transfer the local fastq files to the S3 working directory using the direct call to AWS SDK
#############################################################
# Files will be uploaded to e.g. "s3://my_bucket/workdir/fastq/tiny-fastq/"
FASTQS_S3="$WORKDIR/fastq/$(basename $FASTQS)"
aws s3 cp "$FASTQS" "$FASTQS_S3/" --recursive
#############################################################
# Run processing
#############################################################
RESPONSE=$(curl -ks -X POST -H "Authorization: Bearer $API_TOKEN" -H "Content-Type: application/json" -H "Accept: application/json" -d "{
\"cmdTemplate\": \"cellranger count --id cloud-cellranger --fastqs \$fastqs --transcriptome \$transcriptome\",
\"dockerImage\": "single-cell/cellranger:latest",
\"hddSize\": $INSTANCE_DISK,
\"instanceType\": \"$INSTANCE_TYPE\",
\"params\": {
\"fastqs\": {
\"type\": \"input\",
\"value\": \"$FASTQS_S3\"
},
\"results\": {
\"type\":\"output\",
\"value\":\"$RESULTS_S3\"
},
\"transcriptome\": {
\"type\": \"input\",
\"value\": \"$TRANSCRIPTOME_S3\"
}
}
}" "$API_URL/restapi/run")
RUN_ID=$(jq -r ".payload.id" <<< $RESPONSE)
#############################################################
# Poll the run status each 30s, until it's finished
#############################################################
RUN_STATUS="NA"
while [ "$RUN_STATUS" != "SUCCESS" ]; do
sleep 30
RESPONSE=$(curl -ks -X GET -H "Authorization: Bearer $API_TOKEN" -H "Accept: application/json" "$API_URL/restapi/run/$RUN_ID")
RUN_STATUS=$(jq -r ".payload.status" <<< $RESPONSE)
done