Focus on Your Content, Not on Ingesting Your Content

First Name: 
Last Name: 
Georgetown University
Metadata Validation; Automation; Tools; Ingest; DSpace
DSpace User Group

Preparing a large collection for ingest into a repository should conceptually mirror the process for adding an individual item into a repository just at a larger scale. In reality, the process demands a unique workflow.
Web-based item submission processes are excellent for a single item or a couple of times, however it becomes laborious if you were to ingest more than 20 items. So what happens if you wanted to ingest 500 items or even 1000 items?

When processing a large collection at one time, there can be a significant amount of metadata that is reusable from item to item; or there can be a significant amount of metadata that is reusable after minor modification. Rather than being an exercise in metadata authoring, the process quickly becomes tedious and demands some form of automation.

Repository bulk ingest processes exist to alleviate some of the tedious steps in such a process. These processes require the assembly of a collection of items into some form of an ingest folder structure that contains item content to load and metadata associated with each item. Creating an ingest folder manually with tools such as a text editor is likely to either fail or to require significant re-work. Fortunately, the creation of an ingest folder structure lends itself well to automation.

One must possess a unique skill set to automate the creation of an ingest folder structure successfully. In order to create a collection of ingest folders on a file system, one must have a good knowledge of working on a file server and working data transformation tools. These processes can become unwieldy or in cases even impossible for librarians and other staff without having appropriate server level training or tools to aid in the process. As a result, this approach leads one to focus on how the content will be processed rather than on the quality of the metadata that needs to be authored.

At Georgetown University, when we first started with DSpace, our librarians and staff used the Bulk Item Import Tool provided by DSpace which requires interaction at the server level and a disproportionate amount of energy on managing and planning for the ingestion process. Our goal was to channel this energy towards improving metadata quality and marketing the use of the DigitalGeorgetown repository, which led us to develop and deploy successfully a collection of “Automated Ingest Tools”. These tools have been adopted by 3 of our libraries and has greatly reduced the time, energy and troubleshooting that the traditional bulk ingest process used to require.

This presentation will describe a series of simple applications that were combined to eliminate the tedious and error-prone steps within the bulk ingestion process for DigitalGeorgetown. It will also describe the challenges that were encountered with the bulk ingestion process and the tools that we created to address these issues. There will also be a segment dedicated to the training that was provided and the lessons learned in deploying these applications. Lastly, this presentation will also provide an overview of the applications that the Georgetown University Libraries have made available for use by other institutions.

OpenRepositoriesPresentationProposal.docx20.64 KB