Opportunities and Challenges of Digital Curation

Digital technologies allow us to create, manipulate, store, and make accessible all manner and amounts of information never before possible, yet these same technologies imperil the longevity of the very objects they produce and require very different management than what has been practiced in the paper-based world. A few institutions have been engaged in digital curation activities for several decades, but most institutions are very new to digital curation and do not yet have established practices or resource streams for ensuring success. There are also many fundamental open research questions related to long-term digital preservation. Fortunately, there is a growing awareness of the need to preserve access to digital assets and recognition that digital curation is one of the grand challenges of the early 21st Century (Brophy and Frey 2006; Charter on the Preservation of the Digital Heritage 2003; Fitzgibbon & Reiter 2004; Hedstrom 2003; Hedstrom & Ross 2003; Levy 1998; Library of Congress; NSF Cyberinfrastructure Council 2007; Ross 1998; Rothenberg 1995; The State of Digital Preservation 2002; Tibbo 2003).

A decade of work in digital preservation and access since the Taskforce on Digital Archiving report (Garrett and Waters 1996) to the Commission on Preservation and Access (CPA) and the Research Libraries Group (RLG) has resulted in a set of strategies, technological approaches, and activities now termed “digital curation.” While still an evolving concept, “digital curation” can be defined as “the active management and preservation of digital resources over the life-cycle of scholarly and scientific interest, and over time for current and future generations of users.” (Joint Information Systems Committee 2003). The Joint Information Systems Committee (JISC), the UK funder of the Digital Curation Center (DCC), notes that implicit in data curation specifically, and digital curation more generally, “are the processes of digital archiving and digital preservation,” but that “it also includes all the processes needed for good data creation and management, and the capacity to add value to generate new sources of information and knowledge.” (2003). Digital curation involves the management of digital objects over their entire lifecycle, ranging from pre-creation activities wherein systems are designed, and file formats and other data creation standards are established, through ongoing capture of evolving contextual information for digital assets housed in archival repositories

Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts. The foundational vision of the DCC is that “long term stewardship of digital assets is the responsibility of everyone in the digital information value chain” and that “the maintenance, usability and survival of digital resources depends on regular planned interventions; care needs to be taken at conception, at creation, during use, and as use transitions to lower levels” (Rusbridge 2005). Digital curation extends far beyond repository control and involves attention to content creators and future users.

Work on digital preservation and access in the past decade has resulted in many projects and networks (e.g., CAMiLEON; CASPAR; CEDARS; DELOS Preservation Cluster; DigCCurr; DigitalPreservationEurope; Giaretta 2006; InterPARES; PLANETS; LDB; Lee, Tibbo, and Schaefer, 2007; Library of Congress; nestor; Potter 2002; PRESTO; PrestoSpace; Slats and Verdegem 2004; van der Werf-Davelaar 1999); numerous metadata and encoding standards (e.g. Dublin Core; METS, PREMIS); open repository platforms (DSpace; Fedora; LOCKSS; SRB); and a set of common concepts and terminology, elaborated in the Reference Model for an Open Archival Information System (OAIS) (2002) . The later has become the foundation upon which most, if not all, serious digital archives and repositories trusted to preserve and provide access to digital assets for the long-term are being built. A working group of RLG and OCLC has described the attributes and responsibilities of such trusted repositories (2002). In August 2005, a joint Task Force of RLG and the U.S. National Archives and Records Administration (NARA) published "An Audit Checklist for the Certification of Trusted Digital Repositories." While analogous in functions, digital repositories, as evidenced in the certification checklist, require staff with a different set of skills, especially in terms of technical expertise, than did the libraries and archives of the paper-based world. Such a realization led to the workshop that served as the basis for this special issue of JODI.

Workshop on Digital Curation and Trusted Repositories: Seeking Success

The Workshop, Digital Curation and Trusted Repositories: Seeking Success, was held on June 15, 2006 in conjunction with the Joint Conference on Digital Libraries (JCDL 2006, June 11-15, 2006 - Chapel Hill, NC, USA). It was the most highly attended workshop at JCDL 2006, with more than 60 participants. Its main organizers were Helen Tibbo and Christopher (Cal) Lee of the University of North Carolina at Chapel Hill (UNC-CH), with considerable assistance from Carolyn Hank, TRLN Doctoral Fellow, UNC-CH. Members of the Workshop Program Committee were: Philip Eppard, SUNY-Albany; Cal Lee; Karen Markey, Univerisy of Michigan; Soo Young Rieh, University of Michigan; Helen Tibbo; and Elizabeth Yakel, University of Michigan.

Participants in the Workshop were asked to consider the following questions: Will adherence the RLG-NARA Checklist, by itself, ensure a successful digital repository, especially the institutional repositories emerging on university campuses today? What are the most promising approaches for implementing the attributes? What does trust really mean in the context of a contributor-based repository, and will individuals or organizations contribute to a repository just because they trust that it will preserve digital assets over time? What incentives and assistance are needed? What is the role of the archivist vis-a-vis the digital life cycle and the stewardship of digital assets over time? What, indeed, constitutes a successful digital repository and how can we ascertain and measure such success?

The workshop served as a forum for discussion of how the emerging principles of digital curation, "the active management and appraisal of data over the life-cycle of scholarly and scientific interest" ("What is Digital Curation?"), can work with technical and managerial models to produce trustworthy long-term digital repositories. The workshop was planned to serve an audience of professionals engaged in digital curation and digital repository activities, including digital repository developers and curators; digital archivists and electronic records managers; institutional repository developers; institutional administrators and policy developers; digital librarians; scholars engaged in research intended to benefit the above; and researchers and administrators charged with preserving research data. The overall objective of the Workshop was to bring together a group of diverse professionals, representing a range of experience and expertise, to allow detailed exploration on the issues of digital curation, trusted digital repositories, and assessing success.

There as a rich set of discussions at the workshop, including sixteen presentations (listed in the order in which they were presented):

Major Workshop Themes

This section is based on the "selective synthesis" of major themes that Cal Lee provided at the end of the workshop. Lee began by presenting the audience an image of an emergency checklist for an airplane. He then posed the question: Is availability of the checklist a sufficient condition for making it safely onto the ground? After members of the audience answered that it would not be sufficient, he asked, "What else would you need?" This question framed the remainder of Lee's presentation. He pointed out that, just as an aviation checklist is not a sufficient condition for safely flying or landing a plane, an audit checklist is also not a sufficient condition for establishing, managing and maintaining a trustworthy repository. A good checklist can provide extremely valuable support for knowledge transfer, professionalization, instruction, verification and evaluation, by succinctly summarizing appropriate professional practices and lessons (which may have taken many years and many individuals to develop and refine). There are very good reasons why pilots keep checklists in the cockpit and consult them during various phas es of a flight. However, a checklist will never take the place of appropriate institutional strucutres, routines and commitments; or the skills, habits, insights, problem solving and collaboration of experienced professionals. When faced with an emergency situation, a pilot must often draw on the above resources to act immediately, then consult the emergency checklist (as a memory aid and verification tool) after the situation has been stabilized. In short, it's a checklist, not a do list.

Current environment and trends

The content of the presentations suggested that it is a perfect time to be asking the fundamental questions raised in the workshop call for papers. On the one hand, there is an urgent need to inform the increasing amount of institutional repository planning and development. On the other hand, most institutions are not yet locked into specific answers. According to Markey's presentation, 51% of respondents were not yet planning and only 10% had implemented an institutional repository. LeFurgy explained that there is also a wide diversity of approaches, even by those apparently doing same things. Lee suggested that this is analogous to the timing of the development of the OAIS, at a time when "actors within several streams of activity related to digital preservation perceived the need for a high-level model but had not themselves developed one" and " several actors now felt they had knowledge from their own recent digital archiving efforts, which could inform the development of the OAIS" (Lee, 2005). Workshop presentations also raised important points related to metadata creation and harvesting. Efron's found that there were a relatively small number of items in institutional repositories but relatively rich description of the items. Speakers suggested that there is already a lot of metadata sharing and federation taking place, but a pressing need for more sharing and federation of content.

What are the most promising approaches for implementing the attributes of the RLG-NARA Audit Checklist?

In the case of institutional repositories in academic environments, Markey and Kim both suggested that it might not be faculty or students who directly serve as the main Producers (i.e. submitters) of content. Lynch cautioned that information professionals should be careful not to neglect the substantial body of research being conducted outside of "big science."

Many of the lessons from the workshop speakers related to system architecture and sustainability. In the long-term management of digital collections, it is important not to assume the permanence of any specific hardware or software; Moore and Smith discussed the promise of various forms of virtualization, and Markey reported on respondents' plans to migrate institutional repository within four years.

The workshop also addressed promising ways to structure and coordinate work on digital repositories. Thibodeau elaborated benefits of addressing archival issues as a foundation for other systems within a cyberinfrastructure. Speakers explored various promising efforts to generalize and share products, including open source software licenses; user and developer communities, e.g. DSpace, Fedora, Virtual Data Center (VDC); common modular and extensible tools, e.g. JSTOR/Harvard Object Validation Environment (JHOVE), Storage Resource Broker (SRB); the Global Digital Format Registry (GDFR); joint efforts to support small players, e.g. OhioLINK; and building guidance documents from other existing documents, e.g. the Task Force on Digital Archiving Report, OAIS, Attributes of Trusted Digital Repositories. It is also important to share copies of digital objects in order to ensure redundancy, which will require architectures to support federation (e.g. SRB) and policies for collection sharing in cases when institutions have collections of very different sizes (Donakowski). One of the potential tensions in priority can between building services directly into applications being used by data producers (e.g. NARA’s Records Management Services) or instead moving data out of live systems in order to reduce risks associated with lock-in an obsolescence. For example, Lynch stated that learning management systems are "dreadful places to archive," so it is risky to depend on them for implementing digital curation requirements. Several speakers emphasized that one should not only write or follow rules but should also provide evidence for compliance with the rules (Dale; McHugh; Moore and Smith; Strathmann). Finally, Efron raised a fundamental point that can often be overlooked in discussions of repository trustworthiness: make sure your XML is well-formed.

Lee ended his presentation with "Issues for Future Research or Things to Think about on your Trip Home," which were based on selected evocative points or quotations from the spoken remarks of workshop contributors. These included:

Special Issue of JODI

The authors of the Workshop papers have all provided copies of the papers, and they are publicly available through the Papers and Presentations section of the Workshop web site. Many parti cipants also encouraged us to pursue a publication venue that would allow a subset of interested authors to further develop, expand and disseminate the their workshop products. Cal Lee and Helen Tibbo approached Cliff McKnight with the idea of a special JODI issue on Digital Curation and Trusted Repositories, and he graciously accepted this proposal. We hope that readers will agree that the 10 papers in this issue are valuable contributions to the evolving professional discussions regarding digital curation, and the management, evaluation, audit and certification of repositories.

Given the rapidly evolving nature of current work on certification and audit, it is important to read the papers in this issue within the context in which they were written. The RLG-NARA Audit Checklist was then the most widely recognized English-language guidance document, and several of the Workshop papers built directly from the Checklist. Since then, the Center for Research Libraries has taken responsibility for carrying on the effort and has recently published the revised and expanded Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist. In December 2006, nestor released an English-language version of the Catalogue of Criteria for Trusted Digital Respositories (Dobratz et al 2006). DigitalPreservationEurope and the DCC have created a risk-based approach to audit and certification called DRAMBORA (DPE and DCC 2007). The CRL, DCC and certification working group of nestor are each attempting to establish audit processes for their own localities (US, UK and Germany, respectively). These three groups, along with DigitalPreservaitonEurope, have been collaborating to find points of commonality, as well as regional differences, in their complimentary efforts -- see e.g. the Core Requirements for Digital Archives (2007). A working group was also recently formed to produce an ISO standard for audit and certification of digital repositories.

Many of the authors of this special issue have led and contributed substantially to these recent developments. Several of the research studies represented in this special issue have also generated important further products and findings in recent months, and we encourage interested readers to be on the lookout for further publications from those efforts.


We would like to acknowledge the University of North Carolina at Chapel Hill School of Information and Library Science and University Libraries for sponsoring the JCDL workshop, and Carolyn Hank and members of the Program Committee for helping us to make the workshop such a success. The authors of the pieces in this special issue deserve recognition for their numerous insights and contributions, as well as their patience throughout the production, review and editorial process. We would like to thank Cliff McKnight, Scott Phillips and Anita Coleman for their considerable support and assistance in moving the special issue from conception to final product.


