/
LUX POC Process - ASpace

LUX POC Process - ASpace

  1. Extract the source files: for the POC stage, I'm using EAD files from ArchivesSpace, since we already have a standard process set up to extract and validate those files. I just uploaded the set of EAD files that I used for the previous LUX dump to our Google Drive. Here's a link to a ZIP file of those records:

https://drive.google.com/drive/folders/1WtxR26GG2PpzP7csmqKJQ5_2kacEL1xm?usp=sharing. (Rob, hopefully there's no issue with permissions in getting access to that, but please let me know if so). In theory, I would use our files in GitHub, https://github.com/YaleArchivesSpace/Archives-at-Yale-EAD3, which are updated nightly, but I haven't yet kicked off a mass re-export of all of those files after updating our export process to include additional data for LUX, so a lot those files in GitHub are slightly out of date versus the files that I just uploaded to Google Drive.

  • Transform the source files to LUX's JSON format: all my code and a brief overview of the process is here: https://github.com/fordmadox/EAD-to-LUX.

  • Validate the JSON files: code available in the same repo, but here's a direct link to my first attempt at this process: https://github.com/fordmadox/EAD-to-LUX/blob/master/python-scripts/validate.py.
    I'm relying on https://pypi.org/project/jsonschema/ to do this. Is anyone else using this library? This is one part of the process that I'd really like to update, especially since it takes the longest time for me to run, but I haven't had the time to do anything aside from making sure that it throws errors if there are any. Very curious what other folks are doing for this part of the process (especially since this is one area where we all could utilize the same process / tools already).

  • Upload the JSON files: For this part, I'm using https://aws.amazon.com/cli/ (version 2). That's gone great, but again, curious about other options. I did make sure to add the "--delete" flag to my last batch, which I didn't realize originally was not​ something that would be used as a default by the "sync" option. Just an FYI for anyone else using this tool: "Typically, s3 sync copies missing or outdated files or objects between the source and target. However, you can also supply the--delete option to remove files or objects from the target that are not present in the source"

– Mark Custer (9/18/20)