Skip to content
Contact Support

Freezer long term storage

NeSI's Freezer service powered by Versity, is our completely redesigned long-term storage service to support research data. It consists of a staging area (disk) connected to a tape library. Users of this service gain access to more persistent storage space for their research data, in return for slower access to those files that are stored on tape. We recommend that you use this service for larger datasets that you will only need to access occasionally and will not need to change in situ. The retrieval of data may be delayed, due to tape handling, queuing of the freezer backend service and size of the data to be ingested or retrieved.

Due to the tape storage backend Freezer is intended for use with relatively large files and should not be used for a large number of small files. This service is a replacement for Nearline. Freezer is compatible with the common S3 cloud protocol and existing tools such as those used for accessing AWS S3 service.

Getting started

Before getting started, you will need an allocation and credentials. To apply for an allocation go to MyNeSI. Once onboarded, you can start to use Freezer. We recommend using s3cmd tool for interacting with Freezer.

Tool Installation

The s3cmd tool is available by default.

Configure

Configuring the tool allows for user credentials and default buckets to be remembered.

s3cmd --configure

Enter in your details

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.
Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
Access Key: ${AWS_ACCESS_KEY} 
Secret Key: ${AWS_SECRET_KEY}
Default Region: us-east-1
Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
S3 Endpoint: freezer.nesi.org.nz:7070
Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
if the target S3 system supports dns based buckets.
DNS-style bucket+hostname:port template for accessing a bucket: 210.7.37.122:7070
Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: 
Path to GPG program [/usr/bin/gpg]: 
When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP, and can only be proxied with Python 2.7 or newer

When prompted for HTTP protocol say yes.

Use HTTPS protocol: Yes

You will then be presented with a summary.

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't connect to S3 directly
HTTP Proxy server name: 
New settings:
  Access Key: ${AWS_ACCESS_KEY}
  Secret Key: ${AWS_SECRET_KEY}
  Default Region: us-east-1
  S3 Endpoint: freezer.nesi.org.nz:7070
  DNS-style bucket+hostname:port template for accessing a bucket: freezer.nesi.org.nz:7070
  Encryption password: 
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: True
  HTTP Proxy server name: 
  HTTP Proxy server port: 0
Press y to confim.

Test access with supplied credentials? [Y/n]

Using s3cmd tool to interact with Freezer

List contents of a bucket

List all objects in a bucket

s3cmd ls -r -l -H s3://nearline-99999/

This can also be used to list all the objects in path.

Warning

The listing only shows the storage class when using the -l option. This is important to determine whether the data is available or must be restored from tape first.

List all buckets

List all objects in all buckets (only for NeSI project owners)

s3cmd la

Storage usage by specific bucket

s3cmd du -H s3://nearline_9776
   7G      1781 objects s3://nearline_9776/
s3cmd du -H without specifying a bucket is only available for NeSI project owners.

Put objects

To transfer files/folders to S3 gateway to be archived. CD into where the file/folder is on Mahuika and then use s3cmd put

s3cmd put yourfile s3://nearline_9776/cwil201/yourfile
upload: 'yourfile' -> 's3://nearline_9776/cwil201/yourfile'  [1 of 1]
 172202 of 172202   100% in    0s   920.89 KB/s  done

or folders

s3cmd put yourfolder s3://nearline_9776/cwil201/yourfolder/ --recursive
upload: 'yourfolder/yourfile' -> 's3://nearline_9776/cwil201/yourfolder/yourfolder/yourfile'  [1 of 1]
 172202 of 172202   100% in    0s  1691.71 KB/s  done

Once the upload is successful, as signalled by the 'done' your files/folders stored as objects will automatically be archived to tape by the freezer service. No further user action is needed. Do not delete your files from the bucket unless you do not wish for them to be archived to tape. They will remain in the bucket at least until they are copied to tape and likely for some time afterwards until the cache becomes too full and older files are removed.

Synchronise data

Synchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options).

s3cmd sync yourfolder s3://nearline_9776/cwil201/yourfolder/

Preview or dry-run

Use any of the s3cmd options with -n, --dry-runto preview the action.

Only shows what should be uploaded or downloaded but don't actually do it. May still perform S3 requests to get bucket listings and other information though (only for file transfer commands).

List objects before restore

List contained objects/files/folders:

s3cmd ls -l -H s3://nearline_9776/tb-test/openrefine01/

or all objects recursive -r or --recursive

s3cmd ls -r -l -H s3://nearline_9776/tb-test/openrefine01/

Restore from tape

Restore file from Glacier storage <StorageClass>GLACIER</StorageClass>

s3cmd restore --recursive s3://nearline_9776/tb-test/openrefine01/ 
restore: 's3://nearline_9776/tb-test/openrefine01/1957656657122.project/data.zip'
restore: 's3://nearline_9776/tb-test/openrefine01/1957656657122.project/metadata.json'
restore: 's3://nearline_9776/tb-test/openrefine01/1957656657122.project/metadata.old.json'
restore: 's3://nearline_9776/tb-test/openrefine01/dbextension/.saved-db-connections.json'
restore: 's3://nesi9nearline_97769999/tb-test/openrefine01/workspace.json'
restore: 's3://nearline_9776/tb-test/openrefine01/workspace.old.json

Get objects after restore

Example to get or download the directory openrefine01 and all contained objects/files/folders:

s3cmd get --recursive s3://nearline_9776/tb-test/openrefine01/

s3cmd reference

s3cmd tool