New Deposits: Difference between revisions
No edit summary |
No edit summary |
||
(186 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
The New Deposits Module contains workflows for processing new deposits for ingest into Rosetta. It can be accessed by going to the "Modules" menu and clicking "New Deposits." | |||
= | =Organizing= | ||
===Requirements=== | ===Requirements=== | ||
* | :*'''Folder with subfolders of files.''' | ||
::- An example of this folder structure can be [[Media:Pre_organize.txt|viewed here]] | |||
:*'''Knowledge of master formats represented in subfolders.''' | |||
[ | ::- If unsure of the formats represented in the subfolders, use [[Rosetta_Deposit_Processor_Tools#Count_Formats|Count Formats]] located in the Tools menu. This tool will count all of the formats in a folder and its subfolders. | ||
[ | |||
= | ===Process=== | ||
:*'''Supply path to folder with subfolders of files''' | |||
== | :*'''Select the master formats represented in the subfolders. You can select multiple.''' | ||
* | ::- XML files are automatically added to a 'supplementary' folder. | ||
::- Formats not chosen will be added to the 'access' folder. | |||
* | |||
::[[File:Organize 2.PNG|600px|border]] | |||
=''' | |||
:*'''Submit''' | |||
==Requirements== | ::- An example of a successful Organize log can be [[Media:Organize log.txt|viewed here]] | ||
*An | ::- An example of the folder structure after the Organize process can be [[Media:Post_organize.txt|viewed here]] | ||
==Process== | =Processing= | ||
*Supply path to | ==Requirements Glossary== | ||
*Submit | ===MARCXML Batch=== | ||
[ | |||
A MARCXML batch is an XML file that includes multiple MARCXML records. ALEPH exports multiple records in this batch format. The MARCXML batch can be used to generate the CSV Key that is required for all New Deposit processes. | |||
Individual MARCXML records can be merged into a MARCXML batch using the Rosetta Deposit Processor. '''Go to Tools -> New Deposit Tools -> Create MARCXML Batch'''. You will be prompted to choose a folder of individual MARCXML records. | |||
===CSV Key=== | |||
A CSV Key is needed for '''ALL''' New Deposit processes. The key associates a folder of stream files with its system number and/or metadata file. The bulk of a CSV Key can be generated from a MARCXML file* or EAD file. Once the key is generated, the ENTITY column must be edited to include the names of folders (or, in the Dublin Core Process, filenames) associated with the system number and/or metadata file. | |||
If the CSV Key is generated from an ArchivesSpace EAD or an Aleph MARC record, it should include the Partner Code in the PARTNER column. This Partner Code will automatically be included in the deposit CSV / DC / METS file. | |||
''<nowiki>*</nowiki> MARC records in the MARCXML Batch file MUST have a system number in 001 controlfield to be added to the CSV Key'' | |||
=== Partner Codes === | |||
Partner codes are abbreviations of each Partner's name. They are holdovers from the DigiTool. Partner Codes are used in the submission folder filepath and to build collections in Rosetta. | |||
*'''AJH01''' = American Jewish Historical Society | |||
*'''ASF01''' = American Sephardi Federation | |||
*'''LBI01''' = Leo Baeck Institute | |||
*'''YIV01''' = YIVO Institute for Jewish Research | |||
*'''YUM01''' = Yeshiva University Museum | |||
To generate a CSV Key, go to '''Tools -> New Deposit Tools'''. | |||
==CSV Process== | |||
===Process Requirements=== | |||
:*'''CSV Key''' | |||
::- CSV Key must have values in the ENTITY and LABEL columns. | |||
::- Derived from MARCXML Batch File using '''Tools --> New Deposit Tools --> Generate CSV Key from MARC''' | |||
::- Derived from [http://wiki.cjh.org/index.php/How_to_Export_an_EAD_from_ArchivesSpace exported ArchivesSpace EAD file] using '''Tools --> New Deposit Tools --> Generate CSV Key from EAD''' | |||
:*'''Folder with subfolders of files where the subfolders are further organized into access and master sub-subfolders.''' | |||
::- Subfolders can be organized into ‘master’ and ‘access’ sub-subfolders using the Organize tab. | |||
::- Supplemental files (e.g. indices, manifests, etc.) should be manually placed in a sub-subfolder entitled ‘supplement’. | |||
::- An example of this folder structure can be [[Media:Pre_mets.txt|viewed here]]. | |||
:*'''Deposit template''' | |||
::- Current template version is available for [http://wiki.cjh.org/index.php/File:CJH_CSV.csv download here]. | |||
::- A default deposit template path can be set in '''File --> Settings'''. | |||
===Process=== | |||
:*'''Edit ENTITIES column in CSV Key to include the name (not the path) of each folder in "streams"''' | |||
:*'''Supply path to CSV Key''' | |||
:*'''Supply path to folder with subfolders of files''' | |||
:*'''Supply path to deposit template''' | |||
::- A default deposit template path can be set in '''File --> Settings'''. | |||
::[[File:CSV fields.PNG|600px|border]] | |||
:*'''Submit''' | |||
::- The log will provide the path for the deposit folder. This folder can be copied to the appropriate submissions folder for Rosetta ingest. | |||
::- An example of a successful CSV process log can be [[Media:CSV log.txt|viewed here]]. | |||
==Running Fixity in the Processor== | |||
The Rosetta processor allows staff to generate MD5 checksums to include in a CSV deposit. There is also an option to verify a previously generated fixity manifest to check that the files have not changed. | |||
Rosetta will verify the checksums in the deposit CSV during the ingest process. If a file's MD5 checksum does not the match the checksum listed in the deposit CSV, the deposit will land in the Technical Analyst workbench for further review. | |||
Review CJH's fixity generation and verification recommendations [https://wiki.cjh.org/index.php/Checksum_generation_and_verification_recommendations here]. | |||
===Process Requirements=== | |||
:*'''All the requirements listed [https://wiki.cjh.org/index.php/New_Deposits#Process_Requirements in the CSV process requirements].''' | |||
:*'''A fixity manifest. | |||
::- Only required if opting to verify a previously generated fixity manifest to check that the files have not changed. | |||
::- A fixity manifest should include a list of the files with file paths and the files' MD5 checksum. | |||
::- The tools listed [https://wiki.cjh.org/index.php/Checksum_generation_and_verification_recommendations#Fixity_check_tools here] can generate a fixity manifest. | |||
::- An example of a fixity manifest can be [[Media:SampleFixityManifest.txt|viewed here]]. | |||
===Process=== | |||
====Generate new MD5 checksums to include in the deposit’s CSV.==== | |||
:*'''Prepare a CSV deposit as instructed [http://wiki.cjh.org/index.php/New_Deposits#CSV_Process here].''' | |||
:*'''Check the Run Fixity box.''' | |||
::[[File:CheckRunFixityBox.PNG|600px|border]] | |||
:*'''Click the Submit button.''' | |||
::- The processor will create a deposit CSV with a MD5 checksum value for each file in the V column. | |||
::- An example of a deposit CSV with MD5 checksum values can be [[Media:SampleDepositCSVWithChecksums.csv|viewed here]]. | |||
====Verify a Fixity Manifest that includes MD5 checksums.==== | |||
:*'''Prepare a CSV deposit as instructed [http://wiki.cjh.org/index.php/New_Deposits#CSV_Process here].''' | |||
:*'''Check the Run Fixity box.''' | |||
:*'''Supply path to the fixity manifest.''' | |||
::[[File:SupplyPathToFixityManifest.PNG|600px|border]] | |||
::- When selecting the fixity manifest file, ensure that the drop-down menu next to the file name window is set to All Files. If not, the fixity manifest file may not be viewable. | |||
::[[File:AllFilesDropdown.PNG|600px|border]] | |||
:*'''Click the Submit button''' | |||
::- The processor will create a deposit CSV with a MD5 checksum value for each file in the V column. | |||
::- If the checksums in the fixity manifest do not match the checksums generated by the processor, the processor will display an error like the following and will not create the deposit. | |||
::[[File:IncorrectChecksumErrorMessage.PNG|600px|border]] | |||
:::* This error indicates that a file may have changed since generating the fixity manifest. The error will identify the file(s) with the checksum mismatch(es). | |||
::- The processor will generate MD5 checksums for files that are not in the fixity manifest but are in the streams folder. | |||
==Dublin Core Process== | |||
Non-complex entities with their own MARC record / EAD File Level and no derivatives. | |||
'''NOTE:''' Derivatives can be created once preservation masters are ingested in Rosetta. For ingests containing both masters and derivatives (access copies), use the [[Rosetta_Deposit_Processing#CSV|CSV]] or [[Rosetta_Deposit_Processing#METS|METS]] workflows. | |||
===Process Requirements=== | |||
:*'''Folder of non-complex entities''' (i.e. entities that are only one file) | |||
:*'''CSV Key''' | |||
::- Derived from MARCXML Batch File using '''Tools --> New Deposit Tools --> Generate CSV Key from MARC''' | |||
::- Derived from [http://wiki.cjh.org/index.php/How_to_Export_an_EAD_from_ArchivesSpace exported ArchivesSpace EAD file] using '''Tools --> New Deposit Tools --> Generate CSV Key from EAD''' | |||
::- ENTITY values are individual filenames, NOT folder names for the DC process. | |||
===Process=== | |||
:*'''Edit ENTITIES column in CSV Key to include filenames''' | |||
:*'''Supply path to CSV Key''' | |||
:*'''Supply path to folder of non-complex entities''' | |||
::[[File:DC fields.PNG|border]] | |||
:*'''Submit''' | |||
::- The log will provide the path for the deposit folder. This folder can be copied to appropriate submissions folder for Rosetta ingest. | |||
::- An example of a successful CSV process log can be [[Media:DC log.txt|viewed here]]. | |||
==METS Process== | |||
Complex objects in need of a nested structmap | |||
===Process Requirements=== | |||
:*'''CSV Key''' | |||
::- Derived from MARCXML Batch File using '''Tools --> New Deposit Tools --> Generate CSV Key from MARC''' | |||
::- Derived from [http://wiki.cjh.org/index.php/How_to_Export_an_EAD_from_ArchivesSpace exported ArchivesSpace EAD file] using '''Tools --> New Deposit Tools --> Generate CSV Key from EAD''' | |||
:*'''Folder with subfolders of files where the subfolders are further organized into access and master sub-subfolders.''' | |||
::- Subfolders can be organized into ‘master’ and ‘access’ sub-subfolders using the Organize tab. | |||
::- Supplemental files (e.g. indices, manifests, etc.) should be manually placed in a sub-subfolder entitled ‘supplement’. | |||
::- An example of this folder structure can be [[Media:Pre_mets.txt|viewed here]]. | |||
===Process=== | |||
:*'''Edit ENTITIES column in CSV Key to include stream folders''' | |||
:*'''Supply path to CSV Key''' | |||
:*'''Supply path to folder with subfolders of files''' | |||
::[[File:METS fields.PNG|500px|border]] | |||
:*'''Submit''' | |||
::- The log will provide the path for the METS deposits. These deposit folders can be copied to appropriate submissions folder for Rosetta ingest. | |||
::- An example of a successful METS process log can be [[Media:METS log.txt|viewed here]]. | |||
==Synchronize== | |||
The Synchronize tab is new with Version 2.0. It replaces the Sip Status module and the Add DAO Tab. DAO linking is now rolled into the synchronization job. To learn how to check on a SIP's status using the Rosetta software, see [http://wiki.cjh.org/index.php/How_to_Check_SIP_Status this tutorial]. | |||
::[[File:sync.png|border]] | |||
===Requirements=== | |||
* Internet Access | |||
* Rosetta IE contains either Aleph ID in the "Identifier (DC)" field or an ArchivesSpace Archival Object ID in the "Identifier - Archivesspace (DC)" field. | |||
* IE PID or SIP ID or IE CSV | |||
===Process=== | |||
The Synchronize tab will synchronize an IE or set of IEs with either Aleph or ArchivesSpace. If the SIP is put together with a MARCXML file exported from Aleph, then each IE from the SIP should automatically contain an Aleph ID. Similarly, if the SIP is put together using an EAD exported from ArchivesSpace, then each IE from the SIP should automatically contain an Archival Object ID. | |||
The Synchronize tab will do the following when supplied with an IE or set of IEs: | |||
:'''1. Gather metadata from Rosetta for each IE. | |||
:'''2. Look through the IE metadata for an Aleph and/or ArchivesSpace ID | |||
::- If an ArchivesSpace ID is found, the Processor will use the ArchivesSpace API to see whether there is a link is associated with the ArchivesSpace ID. If there is no link present, it will create a Digital Object in ArchivesSpace with the Rosetta IE link and associate that Digital Object with the appropriate Archival Object. | |||
::- The default ArchivesSpace link caption is "View Online." If you'd like to customize that caption, use the IE CSV input method | |||
: '''3. Update Rosetta IE with Aleph -> DC or ArchivesSpace -> DC metadata.''' | |||
:: - '''New with 2.5''': Now staff can choose which system's metadata record to synchronize with Rosetta. Choosing "Default" from the right-hand list options will prioritize Aleph IDs that are found over ArchivesSpace IDs that are found. Regardless, if an IE has both an Aleph and ArchivesSpace ID, the Processor will still see whether the Archival Object in ArchivesSpace needs a link. | |||
=== IE CSV === | |||
Users can Synchronize IE or SIP numbers one at a time by entering them in the IE PID and SIP ID input fields respectively. These synchronization jobs will use the default "View Online" ArchivesSpace link caption. However, if a user would like to synchronize more than one IE PID or SIP ID at a time, they can put together a simple CSV. | |||
::[[File:ie_csv.png|border]] | |||
The processor will look in each row of the IE CSV for an IE PID first then a SIP ID. If there is a SIP ID, the Processor will use the Rosetta API to retrieve all of the IE PIDs for that SIP. | |||
==== Exporting an IE CSV ==== | |||
Users can either put this CSV together from scratch or export an IE CSV from Rosetta: | |||
:'''1. Export IE information as a CSV from Rosetta.''' | |||
::[[File:ie_csv_export_1.png|700px|border]] | |||
:'''2. Add custom ASPACE CAPTION column if desired''' | |||
::[[File:ie_csv_export_2.png|700px|border]] | |||
:'''3. Supply IE CSV path | |||
::[[File:ie_csv_export_3.png|700px|border]] |
Latest revision as of 15:47, 29 July 2024
The New Deposits Module contains workflows for processing new deposits for ingest into Rosetta. It can be accessed by going to the "Modules" menu and clicking "New Deposits."
Organizing
Requirements
- Folder with subfolders of files.
- - An example of this folder structure can be viewed here
- Knowledge of master formats represented in subfolders.
- - If unsure of the formats represented in the subfolders, use Count Formats located in the Tools menu. This tool will count all of the formats in a folder and its subfolders.
Process
- Supply path to folder with subfolders of files
- Select the master formats represented in the subfolders. You can select multiple.
- - XML files are automatically added to a 'supplementary' folder.
- - Formats not chosen will be added to the 'access' folder.
- Submit
- - An example of a successful Organize log can be viewed here
- - An example of the folder structure after the Organize process can be viewed here
Processing
Requirements Glossary
MARCXML Batch
A MARCXML batch is an XML file that includes multiple MARCXML records. ALEPH exports multiple records in this batch format. The MARCXML batch can be used to generate the CSV Key that is required for all New Deposit processes.
Individual MARCXML records can be merged into a MARCXML batch using the Rosetta Deposit Processor. Go to Tools -> New Deposit Tools -> Create MARCXML Batch. You will be prompted to choose a folder of individual MARCXML records.
CSV Key
A CSV Key is needed for ALL New Deposit processes. The key associates a folder of stream files with its system number and/or metadata file. The bulk of a CSV Key can be generated from a MARCXML file* or EAD file. Once the key is generated, the ENTITY column must be edited to include the names of folders (or, in the Dublin Core Process, filenames) associated with the system number and/or metadata file.
If the CSV Key is generated from an ArchivesSpace EAD or an Aleph MARC record, it should include the Partner Code in the PARTNER column. This Partner Code will automatically be included in the deposit CSV / DC / METS file.
* MARC records in the MARCXML Batch file MUST have a system number in 001 controlfield to be added to the CSV Key
Partner Codes
Partner codes are abbreviations of each Partner's name. They are holdovers from the DigiTool. Partner Codes are used in the submission folder filepath and to build collections in Rosetta.
- AJH01 = American Jewish Historical Society
- ASF01 = American Sephardi Federation
- LBI01 = Leo Baeck Institute
- YIV01 = YIVO Institute for Jewish Research
- YUM01 = Yeshiva University Museum
To generate a CSV Key, go to Tools -> New Deposit Tools.
CSV Process
Process Requirements
- CSV Key
- - CSV Key must have values in the ENTITY and LABEL columns.
- - Derived from MARCXML Batch File using Tools --> New Deposit Tools --> Generate CSV Key from MARC
- - Derived from exported ArchivesSpace EAD file using Tools --> New Deposit Tools --> Generate CSV Key from EAD
- Folder with subfolders of files where the subfolders are further organized into access and master sub-subfolders.
- - Subfolders can be organized into ‘master’ and ‘access’ sub-subfolders using the Organize tab.
- - Supplemental files (e.g. indices, manifests, etc.) should be manually placed in a sub-subfolder entitled ‘supplement’.
- - An example of this folder structure can be viewed here.
- Deposit template
- - Current template version is available for download here.
- - A default deposit template path can be set in File --> Settings.
Process
- Edit ENTITIES column in CSV Key to include the name (not the path) of each folder in "streams"
- Supply path to CSV Key
- Supply path to folder with subfolders of files
- Supply path to deposit template
- - A default deposit template path can be set in File --> Settings.
- Submit
- - The log will provide the path for the deposit folder. This folder can be copied to the appropriate submissions folder for Rosetta ingest.
- - An example of a successful CSV process log can be viewed here.
Running Fixity in the Processor
The Rosetta processor allows staff to generate MD5 checksums to include in a CSV deposit. There is also an option to verify a previously generated fixity manifest to check that the files have not changed.
Rosetta will verify the checksums in the deposit CSV during the ingest process. If a file's MD5 checksum does not the match the checksum listed in the deposit CSV, the deposit will land in the Technical Analyst workbench for further review.
Review CJH's fixity generation and verification recommendations here.
Process Requirements
- All the requirements listed in the CSV process requirements.
- A fixity manifest.
- - Only required if opting to verify a previously generated fixity manifest to check that the files have not changed.
- - A fixity manifest should include a list of the files with file paths and the files' MD5 checksum.
- - The tools listed here can generate a fixity manifest.
- - An example of a fixity manifest can be viewed here.
Process
Generate new MD5 checksums to include in the deposit’s CSV.
- Prepare a CSV deposit as instructed here.
- Check the Run Fixity box.
- Click the Submit button.
- - The processor will create a deposit CSV with a MD5 checksum value for each file in the V column.
- - An example of a deposit CSV with MD5 checksum values can be viewed here.
Verify a Fixity Manifest that includes MD5 checksums.
- Prepare a CSV deposit as instructed here.
- Check the Run Fixity box.
- Supply path to the fixity manifest.
- - When selecting the fixity manifest file, ensure that the drop-down menu next to the file name window is set to All Files. If not, the fixity manifest file may not be viewable.
- Click the Submit button
- - The processor will create a deposit CSV with a MD5 checksum value for each file in the V column.
- - If the checksums in the fixity manifest do not match the checksums generated by the processor, the processor will display an error like the following and will not create the deposit.
- This error indicates that a file may have changed since generating the fixity manifest. The error will identify the file(s) with the checksum mismatch(es).
- - The processor will generate MD5 checksums for files that are not in the fixity manifest but are in the streams folder.
Dublin Core Process
Non-complex entities with their own MARC record / EAD File Level and no derivatives.
NOTE: Derivatives can be created once preservation masters are ingested in Rosetta. For ingests containing both masters and derivatives (access copies), use the CSV or METS workflows.
Process Requirements
- Folder of non-complex entities (i.e. entities that are only one file)
- CSV Key
- - Derived from MARCXML Batch File using Tools --> New Deposit Tools --> Generate CSV Key from MARC
- - Derived from exported ArchivesSpace EAD file using Tools --> New Deposit Tools --> Generate CSV Key from EAD
- - ENTITY values are individual filenames, NOT folder names for the DC process.
Process
- Edit ENTITIES column in CSV Key to include filenames
- Supply path to CSV Key
- Supply path to folder of non-complex entities
- Submit
- - The log will provide the path for the deposit folder. This folder can be copied to appropriate submissions folder for Rosetta ingest.
- - An example of a successful CSV process log can be viewed here.
METS Process
Complex objects in need of a nested structmap
Process Requirements
- CSV Key
- - Derived from MARCXML Batch File using Tools --> New Deposit Tools --> Generate CSV Key from MARC
- - Derived from exported ArchivesSpace EAD file using Tools --> New Deposit Tools --> Generate CSV Key from EAD
- Folder with subfolders of files where the subfolders are further organized into access and master sub-subfolders.
- - Subfolders can be organized into ‘master’ and ‘access’ sub-subfolders using the Organize tab.
- - Supplemental files (e.g. indices, manifests, etc.) should be manually placed in a sub-subfolder entitled ‘supplement’.
- - An example of this folder structure can be viewed here.
Process
- Edit ENTITIES column in CSV Key to include stream folders
- Supply path to CSV Key
- Supply path to folder with subfolders of files
- Submit
- - The log will provide the path for the METS deposits. These deposit folders can be copied to appropriate submissions folder for Rosetta ingest.
- - An example of a successful METS process log can be viewed here.
Synchronize
The Synchronize tab is new with Version 2.0. It replaces the Sip Status module and the Add DAO Tab. DAO linking is now rolled into the synchronization job. To learn how to check on a SIP's status using the Rosetta software, see this tutorial.
Requirements
- Internet Access
- Rosetta IE contains either Aleph ID in the "Identifier (DC)" field or an ArchivesSpace Archival Object ID in the "Identifier - Archivesspace (DC)" field.
- IE PID or SIP ID or IE CSV
Process
The Synchronize tab will synchronize an IE or set of IEs with either Aleph or ArchivesSpace. If the SIP is put together with a MARCXML file exported from Aleph, then each IE from the SIP should automatically contain an Aleph ID. Similarly, if the SIP is put together using an EAD exported from ArchivesSpace, then each IE from the SIP should automatically contain an Archival Object ID.
The Synchronize tab will do the following when supplied with an IE or set of IEs:
- 1. Gather metadata from Rosetta for each IE.
- 2. Look through the IE metadata for an Aleph and/or ArchivesSpace ID
- - If an ArchivesSpace ID is found, the Processor will use the ArchivesSpace API to see whether there is a link is associated with the ArchivesSpace ID. If there is no link present, it will create a Digital Object in ArchivesSpace with the Rosetta IE link and associate that Digital Object with the appropriate Archival Object.
- - The default ArchivesSpace link caption is "View Online." If you'd like to customize that caption, use the IE CSV input method
- 3. Update Rosetta IE with Aleph -> DC or ArchivesSpace -> DC metadata.
- - New with 2.5: Now staff can choose which system's metadata record to synchronize with Rosetta. Choosing "Default" from the right-hand list options will prioritize Aleph IDs that are found over ArchivesSpace IDs that are found. Regardless, if an IE has both an Aleph and ArchivesSpace ID, the Processor will still see whether the Archival Object in ArchivesSpace needs a link.
IE CSV
Users can Synchronize IE or SIP numbers one at a time by entering them in the IE PID and SIP ID input fields respectively. These synchronization jobs will use the default "View Online" ArchivesSpace link caption. However, if a user would like to synchronize more than one IE PID or SIP ID at a time, they can put together a simple CSV.
The processor will look in each row of the IE CSV for an IE PID first then a SIP ID. If there is a SIP ID, the Processor will use the Rosetta API to retrieve all of the IE PIDs for that SIP.
Exporting an IE CSV
Users can either put this CSV together from scratch or export an IE CSV from Rosetta:
- 1. Export IE information as a CSV from Rosetta.
- 2. Add custom ASPACE CAPTION column if desired
- 3. Supply IE CSV path