5.5. Download data from external resources to the cloud data storage

Overview

Users often get the raw datasets from the external partners for processing. CP provides comfortable way to load a list of such files as external links to the CP GUI and launch a data load procedure to the cloud storage in background mode. Users can provide CSV/TSV files with the external links and submit a data transfer job, so that the files will be moved to the cloud storage in the background.

Upload metadata with list of external resources from CSV/TSV file

Note: To upload a Metadata to a Folder you need to have WRITE permission for that folder and the ROLE_ENTITY_MANAGER role. For more information see 13. Permissions.
Note: file with list of external resources should have PATH column and http/ftp links in this column (example: sample.csv)

  1. Navigate on folder.
  2. Click on "Upload metadata" button.
  3. Navigate to CSV/TSV file with list of external resources in appeared dialog and select file. Click "Open" button.
  4. Click on "Metadata" object:
    CP_DownloadDataFromExternalResources
  5. Click on "Sample" class:
    CP_DownloadDataFromExternalResources
  6. List of samples with the external links will be appeared:
    CP_DownloadDataFromExternalResources

Download data from external resources

  1. Above the list of external links, uploaded on previous step, click "Transfer to the cloud" button:
    CP_DownloadDataFromExternalResources
  2. The pop-up window for preparing for transferring will appear:
    CP_DownloadDataFromExternalResources
    In this window fill fields:
    a. Input data storage which will be use as a destination for downloading data. Click on CP_DownloadDataFromExternalResources button. In opened pop-up window select required storage (on the left panel with folder-tree) (1). In selected storage set the checkbox opposite the folder name, where external data will be downloaded (2). Click "Ok" button (3):
    CP_DownloadDataFromExternalResources
    Note: you need to have READ and WRITE permissions for folder and data storage, that contain folder for downloading.
    b. Select CSV/TSV columns (only from PATH columns), which shall be used to get external URLs (if several columns contain URLs - all can be used).
    c. (optionally) Select whether to rename resulting path to some other value (can be specified as another column cell, e.g. sample name). For do that click on  button and select from dropdown list.
    d. (optionally) Input max threads count, if needs to limit.
    e. (optionally) Set if needs to create new folders within destination in case when several columns are selected for "Path fields" option (b). E.g. if two columns contain URLs and both are selected - then folders will be created for the corresponding column name and used for appropriate files storage.
    f. (optionally) Set if needs to update external URL within a table to the new location of the files. If set - http/ftp URLs will be changed to data storage path. Such data structure can be then used for a processing by a pipeline:
    • URLs are changed to the clickable storage-hyperlinks (checkbox is set): CP_DownloadDataFromExternalResources
    • URLs aren't changed and not-clickable hyperlinks (checkbox isn't set): CP_DownloadDataFromExternalResources
  3. Once you filled the form, click "Start download" button:
    CP_DownloadDataFromExternalResources
  4. System pipeline (transfer job) will be started automatically:
    CP_DownloadDataFromExternalResources
  5. Once the transfer job will be finished successfully, files will be located into the selected data storage:
    CP_DownloadDataFromExternalResources