Sync'ing files between NeSI and another computer with globus-automate
It is common to generate large amounts of simulation data on NeSI and then having to migrate the files to another computer for storage or post-processing.
Here we show how to transfer data from NeSI to another computer programmatically, that is without using a web graphical user interface and without typing your credentials each time you initiate the transfer.
You can also use this approach to synchronise your files, that is to copy only the files that don't yet exist at the destination point, or refresh the files that have changed since you last triggered a transfer.
We'll assume that you have a NeSI account, you have registered at https://globus.org, and have created a guest collections on NeSI and a private mapped collection on the destination computer (follow the instructions at Data_transfer_between_NeSI_and_a_PC_without_NeSI_two_factor_authentication). A guest collection is directory whose content is shared via Globus.
Step 1: Write a JSON file describing the transfer¶
On NeSI, create a file named transfer_input.json
with the following
content:
{
"source_endpoint_id": "ENDPOINT1",
"destination_endpoint_id": "ENDPOINT2",
"transfer_items": [
{
"source_path": "SOURCE_FOLDER",
"destination_path": "DESTINATION_FOLDER",
"recursive": true
}
],
"sync_level": SYNC_LEVEL,
"notify_on_succeeded": true,
"notify_on_failed": true,
"notify_on_inactive": true,
"verify_checksum": true
}
where
ENDPOINT1
is the source endpoint UUID, which you can get https://app.globus.org/collections by clicking on the collection of your choice. Using a guest collection will allow you to transfer the data without two-factor authenticationENDPOINT2
is the destination UUID, e.g. your personal endpoint UUID, which may be for your private mapped collection if you're transferring to your personal computerSOURCE_FOLDER
is the relative path of the source folder in the source endpoint. This is a directory, it cannot be a file. Use "/" if you do not intend to transfer the data from sub-directoriesDESTINATION_FOLDER
is the absolute path of the destination folder in the destination endpoint when the destination is a private mapped collectionSYNC_LEVEL
specifies the synchronisation level in the range 0-3.SYNC_LEVEL=0
will transfer new files that do not exist on destination. Leaving this setting out will overwrite all the files on destination. See how other sync_level settings can be used to update data in the destination directory based on modification time and checksums.
Step 2: Initiate the transfer¶
Load the globus-automate-client
environment module
module purge && module load globus-automate-client/0.16.1.post1-gimkl-2022
then start the transfer using
globus-automate action run --action-url https://actions.globus.org/transfer/transfer \
--body transfer_input.json
The first printed line will display the ACTION_ID
. You can monitor
progress with
globus-automate action status --action-url \
https://actions.globus.org/transfer/transfer ACTION_ID
or on the web at https://app.globus.org/activity.