How do I manage files in a S3 Deep Archive Bucket

This guide covers issues relating specifically to S3 Deep Archive buckets. For more general information on using S3 Buckets (connecting to a bucket, uploading files), please refer to https://yaleits.atlassian.net/wiki/spaces/spinup/pages/829292599 .

Objective
Upload files, create retrieval jobs, track a jobs status, manage expiry, and download the retrieved objects contained within a S3 Deep Archive Bucket.

Introduction

S3 Deep Archive is different from other S3 storage classes primarily in terms of retrieval times and costs. Unlike the immediate access you get with standard S3 buckets, S3 Deep Archive is designed for long-term data storage where access is infrequent and retrieval can afford to take hours. This translates to highly reduced storage costs, making it an attractive option for archival purposes.

This guide will equip you with knowledge on creating retrieval jobs, tracking their status, downloading the retrieved data, and managing object expiry.

S3 Deep archive should only be used for storage where files need to be retrieved 1-2 times a year. If you are actively working with files (uploading / downloading / deleting), you should consider other storage options, such as S3 (via Spinup) or Wasabi. More info on comparing storage options can be found on Storage Finder @ Yale.

Retrieving a file (via the Amazon Command Line Interface)

  1. Initiate a Restore Request:

    aws s3api restore-object --bucket YOUR_BUCKET_NAME --key YOUR_FILE_KEY --restore-request Days=NUMBER_OF_DAYS_TO_KEEP_RESTORED,GlacierJobParameters={Tier="STANDARD"}
    • Replace YOUR_BUCKET_NAME with the name of your bucket.

    • Replace YOUR_FILE_KEY with the S3 key of the file you want to restore.

    • Replace NUMBER_OF_DAYS_TO_KEEP_RESTORED with the number of days you want to keep the restored copy available for. After this time period, the temporary copy will be deleted, but the original in Glacier Deep Archive remains.

  2. Check the Restoration Status:
    It typically takes about 12 hours for the restoration process to complete. To check the status:

    aws s3api head-object --bucket YOUR_BUCKET_NAME --key YOUR_FILE_KEY

    In the output, look for the "Restore" field. When the file is ready, it'll state "ongoing-request": "false".

  3. Retrieve the Restored File:
    After restoration completes:

    aws s3 cp s3://YOUR_BUCKET_NAME/YOUR_FILE_KEY DESTINATION_PATH_ON_YOUR_LOCAL_MACHINE

    Replace DESTINATION_PATH_ON_YOUR_LOCAL_MACHINE with where you want to download the file to.

You are only billed for the restored copy of the file for the number of days you specified in the restore request. After those days, the temporary restored copy will be deleted, but the original in the Glacier Deep Archive will remain intact.

Retrieving a file (via Cyberduck)

  1. Initiate a Restore Request:

    • Locate the file you wish to restore in Cyberduck.

    • Right-click on the file.

    • Select "Restore" from the context menu.

  2. Check the Restoration Status:

    • The restoration process typically takes approximately 12 hours.

    • To verify the status: a. Right-click on the file. b. Choose “Info” from the context menu. c. Navigate to the "Metadata" tab. d. Search for the "Restore" field. The file is ready for retrieval when marked "ongoing-request": "false".

  3. Retrieve the Restored File:

    • When the file is ready:

      • a. Right-click on the file.

      • b. Choose "Download" from the menu.

    • Alternatively, you can drag & drop the file to your preferred directory.