Panel | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||
What is it?Secure data set is an offering in Spinup that allows users to work with a sensitive data set and manage access, and data use agreements. You can create it in any moderate or high-risk space and then attach selected secure servers in that same space to the data set so you can perform computations against it. There are two types of data sets currently supported: original (immutable once created and finalized) and derivative (can be constantly modified and added to). An example of an original is a data set that you purchased or otherwise obtained, that is bound by a specific data use agreement and cannot be modified, however, you want to be able to work with it or give other users access to it. A derivative is more flexible. It could be derived from another data set, e.g. if you need to make modifications without affecting the original, or it could be used to store the output of specific calculations against an original. A derivative data set can be promoted to an original at any point which will make it immutable. How does it work?Secure data sets in Spinup are based on AWS S3 - we create a bucket where you will upload your data set (this can be any collection of files). In addition, we keep various metadata about the data set and manage access to it by generating an access key (for the initial upload) or assigning instance roles to selected servers. For example, when you create an original data set we set up an empty bucket (at that point the data set is in a pending state) and let you generate an access key that can be used to upload the data using any S3 client (e.g. from your workstation). Once all files are uploaded, you finalize the data set which disables all access keys. In order to meet security requirements, once the data set is finalized you can only access it from hardened Spinup servers in that space. One data set can be attached to several different servers, and you can also attach multiple data sets to a single server. You will be charged for the S3 storage for the data set plus any servers that you create to work with the data. How do I use it?From your Spinup space create a new "Secure data set" resource. Note that it is only available for moderate and high-risk spaces. Pick the type of data set (original or derivative), enter a name and description and click Create Dataset. Note on derivate data sets: On creation you should see some information about the new data set. We automatically generate a unique ID for each data set. Note the Repository is the actual name of the S3 bucket that will hold the data. Managing attachmentsFrom the "Dataset attachments" panel at the bottom, click the + and upload any files associated with this data set (e.g. a Data Use Agreement or End User Agreement). You can upload as many attachments as you want and those will be stored with the data set in a special Once an attachment is uploaded it will show up in the panel with a link that downloads/opens the document. The link is pre-signed and expires in 5 minutes to prevent sharing. Note: Currently, you can add and remove attachments at any time, although eventually we may need to restrict that or at least disable attachment removal once a data set is finalized. Uploading data set filesIn order to upload the files that are part of the data set we need to generate an access key and use any S3-compatible client from the laptop or workstation where the data is currently stored. In the "Dataset access" panel click "Get initial credentials" and then click "Get access key" Copy the Using the above credentials, connect to the S3 bucket for this data set - the name of the bucket is listed under the You can use this procedure if you need help connecting to the S3 bucket: How do I use a Spinup S3 bucket? It will show you 3 ways of connecting (you can pick one that you like, we recommend CyberDuck): 1) AWS CLI, 2) CrossFTP, 3) CyberDuck. Once you connect, just copy all files that are part of this data set. Depending on the size this may take minutes or days. If you lose your Secret key you can reset it to get a new one from the "Dataset access" panel. Note that you can only reset the key one time for security purposes. If, for some reason, you need to reset it a second time you can contact Spinup support for help. At this point we'll assume that all files have been uploaded and the data set is ready to be finalized. From the "Control Panel" click Finalize and then confirm. Note that if this is an After finalizing the state of our data set changes from All access keys are also removed and the the only way to access the data set at this point is by attaching a server to it. Managing server access to data setBy default no servers have access to a new data set. You will need to create a server in the same space if you don't already have one. To do that, go to your space, click From the "Dataset access" panel click Pick a server from the list and click After a few seconds the server should show up in the list: At this point the above server has been granted access to this data set. More specifically, it has an AWS instance profile policy that allows it to connect to the underlying S3 bucket without using any credentials. This means that when you log into that server you can connect to the data set repository using any S3-compatible client or the AWS CLI and you don't need to provide an access key and secret. We can, at any time, revoke access from a server by clicking the trash bin next to it. Clicking Mounting a data set from a Linux serverDepending on your use case, you may need to mount the data set as a file system, e.g. if you need to run some script against it that expects to read files. On Linux you can use the free First, Run the following commands to install
Or, to install on Amazon Linux
Or, to install on Ubuntu:
Confirm it's installed:
We just need to specify the S3 bucket name (the
Note that if you give it the wrong bucket name or the server doesn't have permissions to this data set, You should now be able to go to If you want to make Mounting a data set from a Windows serverThe Windows server instances are provisioned with Rclone and WinFsp. Rclone allows you to connect to cloud-based object stores from multiple vendors and synchronize your files. WinFsp is a FUSE driver that allows you to mount the cloud-based object store as a mapped network drive. For convenience, these utilities have been wrapped in Powershell scripts. In order to create a persistent volume mount to your data enclave bucket, you must first launch a Powershell terminal session with elevated privileges:
Within the elevated terminal session, enter the following command:
This will mount the specified S3 bucket to the next available drive letter on the system. This drive mapping will persist through shutdowns and reboots. In order to remove this drive, enter the following command:
|
...