Nationwide Census of Institutional Repositories Preliminary Findings

Abstract

The MIRACLE (Making Institutional Repositories A Collaborative Learning Environment) Project is a multi-phased research project that is investigating the development of institutional repositories (IRs) in U. S. colleges and universities to identify models and best practices in the administration, technical infrastructure, access to, and preservation of repository collections. This paper features preliminary findings from the project's first phase, a nationwide census that will reveal the extent of college and university involvement with IRs in the United States. Preliminary findings address the types of investigative activities that institutions are conducting prior to making the decision to implement an IR, perceived benefits of IRs, staffing for IRs, methods of recruiting digital content, characteristics of operational IRs including their costs, and participating institutions' next steps on the road to implementing an IR.

1. INTRODUCTION

The development of institutional repositories (IRs) is a very new enterprise for many colleges and universities nationwide. Successful IRs could revolutionize scholarly publication in learning communities, opening up access to research much earlier in the discovery process and reaching entirely new audiences (Lim, 1996; Prosser, 2003). This paper documents preliminary findings from a U. S. census that investigates the extent of college and university involvement with IRs. The census is the first of several investigative activities undertaken by the MIRACLE (Making Institutional Repositories A Collaborative Learning Environment) Project in the course of accomplishing its goal of identifying specific factors contributing to the success of institutional repositories and effective ways of accessing and using repositories.

2. IR SURVEYS TO DATE

Originally MIRACLE Project investigators had proposed to survey operational institutional repositories (IRs) in North America; however, we were concerned that we would be duplicating the efforts of Charles Bailey and his associates who were working on an ARL (Association of Research Libraries) SPEC Kit that reports on the results of a recent survey of Association members regarding their IR activity (George, 2006). Other surveys target specific user groups such as CNI members in the United States (Lynch and Lippincott, 2005), CNI members abroad (Van Westrienen and Lynch, 2005), members of the Canadian Association for Research Libraries (Shearer, 2004), members of the Association for Research Libraries (Bailey et al., 2006), and early adopters of IR technology worldwide (Ware, 2004). Examining these surveys' results, MIRACLE project investigators decided not to limit their efforts to a particular user group, membership, affiliation, or restrict participation to institutions with an operational IR. Instead, we sought to cast our net broadly and fill a void. Conducting a U. S. census of academic institutions about their involvement with IRs, MIRACLE Project investigators will not be excluding institutions that have not jumped on the IR bandwagon. Being more inclusive will not only increase our confidence that we will be able to identify the wide range of practices, policies, and operations in effect at institutions where decision-makers are contemplating, planning, pilot testing, or implementing IRs but also enable us to learn why some institutions have ruled out IRs entirely.

3. PROJECT DESIGN AND CENSUS METHODS

The MIRACLE Project features these five investigative activities over a three-year period:

1. Nationwide census of institutional repositories, 11 months, October 2005-August 2006
2. Telephone interviews with IR staff, 6 months, September 2006-March 2007
3. Case studies of five model IRs, 5 months, March 2007-August 2007
4. Survey of IR users, 9 months, August 2007-April 2008
5. Experimental study of IR searching, 9 months, January 2008-September 2008

Because the nationwide census was ongoing during the writing of this paper, this report of census findings is preliminary and subject to change in subsequent documents. Census findings are foundational and will help project investigators plan and execute future activities.

Project investigators are recruiting the library directors of four-year colleges and universities in the United States to participate in the census. We purchased mailing lists of library director names and addresses from Information Today's American Library Directory Online and Thompson-Peterson's. We subtracted duplicates and community colleges to arrive at a sample of 2,147 email addresses. Using the results of a comprehensive literature search for inspiration for questions and answer categories, project investigators are drafting survey instruments and programming them on SurveyMonkey for distribution via the World-Wide Web. Project staff are sending email messages to library directors asking them to participate in the census by first characterizing the extent of their involvement with IRs as follows: (1) implementation of an IR (IMP), (2) planning & pilot testing an IR software package (PPT), (3) IR planning only (PO), or (4) no IR planning to date (NP). In response to their answer to this question, project staff send them a link to one of four survey instruments. Many of the same questions are listed across two, three, or all four survey instruments so that comparisons can be made based on the extent of institutions' involvement with IRs. Some directors are completing questionnaires on their own and others delegating the task to decision-makers at their institution who are knowledgeable about their institution's plans for IRs.

4. PRELIMINARY CENSUS FINDINGS

Section 4 features respondents' answers to census questions received through mid-May 2006 that pertain to these topics:

Extent of IR implementation by census respondents
Investigative activities for decision-making
Benefits of IRs
Pilot-test and operational IR(s)
Responsibility for the IR
Recruiting IR content
IR costs
Next steps on the road to an IR

4.1 Extent of IR Implementation by Census Respondents

To date, project investigators have been successful sending email messages to library directors at 2,117 college and university libraries to invite them to participate in the nationwide census. We are researching names from 30 institutions that have bounced our email messages back to us as "undeliverable." Of the 273 respondents to date, 28 (10%) have characterized their IR involvement as having an operational IR (IMP), 42 (15%) as planning & pilot testing IRs (PPT), 65 (24%) as planning only (PO), and 138 (51%) as no planning to date (NP). All of these respondents have completed questionnaires pertaining to one of these four levels of IR involvement.

4.2 Investigative Activities for Decision-Making

Three questionnaires (IMP, PPT, & PO) ask respondents to rate the importance of 12 investigative activities in influencing decisions about IR implementation. Because most respondents give high ratings to listed activities, we weight their ratings so that we can rank order them from top to bottom. Table 1 lists the top three and bottom three ranked activities per respondent type, respectively. Ranks other than these are enclosed in parentheses.

Table 1. Investigative Activities

Table 1

All three respondent types are almost in agreement about the importance of learning from other institutions' IR activities. PPT and PO respondents want to learn about available expertise and assistance from library consortia, networks, groups of libraries, etc. This activity is not as helpful to decision-makers at IMP institutions, perhaps, because they are early adopters of IRs who have chosen to speed ahead of the pack and get involved with IRs before others.
Because two of the bottom-ranked activities pertain to waiting to begin the IR effort until a critical mass of IR implementation happens. we conclude that census institutions want to get involved with IRs now rather than wait.

4.3 Benefits of IRs

All four questionnaires (IMP, PPT, PO, & NP) ask decision-makers to rate the anticipated importance of 16 benefits of IRs during the IR planning process. Project staff weighted their ratings so that benefits could be rank ordered from top to bottom. Table 2 lists the top three and bottom two ranked activities per respondent type. Ranks other than these are enclosed in parentheses. T's indicate a ranked benefit that tied another benefit's weight.

Table 2. Top-ranked Benefits

IR decision-makers give top rankings to benefits that have a direct impact on their institution's learning community. This includes preservation of digital assets. Only decision-makers at IMP institutions take note of the IR's ability to increase the library's role as a viable research partner. Decision-makers rank indirect benefits at the bottom of the heap. One low-ranked benefit pertained to an increase in citation counts. Had project investigators asked active scholars and scientists about citation counts, it might gotten a higher rating due to their interest in other researchers citing their work.

Questionnaires for implementers (IMP) ask them whether certain benefits increased or decreased in importance following their implementation of an IR. These two benefits register at least a 33% increase in importance-"an increase in your library's role as a viable partner in the research enterprise" and "longtime preservation of your institution's digital output."

4.4 Pilot-test and Operational IRs

Of the 28 decision-makers whose institutions have operational IRs, 21 have one, four have two, and one has three such IRs available to members of their institution's learning community. Not all decision-makers identify their IR's software package by name. Those who do name these packages: (1) 9 for DSpace, (2) 5 for bePress, (3) 4 for ProQuest's Digital Commons, (4) 2 for local solutions, and (5) 1 each for Ex Libris' DigiTools and Virginia Tech's ETD. At pilot-test institutions, decision-makers are trying these packages: (1) 17 for DSpace, (2) 9 for OCLC's ContentDM, (3) 5 for Fedora, (4) 3 each for bePress, DigiTool, ePrints, and Greenstone, (5) 2 each for Innovative Interfaces, Luna, and ETD, and (6) 1 each for Digital Commons, Encompass, a local solution, and Opus. Consistent with previous surveys (Lynch and Lippincott, 2005; Van Westrienen and Lynch, 2005; Shearer, 2004), the MIRACLE Project census confirms that DSpace leads in both systems implemented and pilot-tested.

Decision-makers are asked to estimate the number of documents in their operational or pilot-tested IR. Table 3 gives preliminary results.

Table 3. Numbers of Documents in IRs

Table 3

Generally, operational IRs contain more documents than IRs in the pilot-testing phase. About two-thirds of pilot-tested IRs contain up to 500 documents. This proportion drops to about two-fifths for operational IRs. Over 5,000 documents are contained in 17% of the operational and 8% of pilot-test IRs.

IMP questionnaires ask decision-makers to rate their system's capabilities. Table 4 lists the percentages of respondents who rate listed methods as "very adequate" or "somewhat adequate."

Table 4. Adequacy of IR-system Capabilities

Table 4

Generally IR-system functionality for browsing, searching, and retrieving digital content is satisfactory and the user interface receives middle-ground grades. Because the user interface is connected to two less-than-satisfactory features, controlled vocabulary and authority control, IR systems could benefit from improvements to system features that people use to retrieve digital content. While systems rate high in terms of file formats and adherence to open access standards, digital preservation rates in the bottom half of system capabilities. Asked about reasons why they would migrate to a new IR-system, 100% of IMP respondents say they would migrate for greater capacity for handling preservation. Thus, IR systems could also benefit from digital preservation improvements.

4.5 Responsibility for the IR

Several questions address staffing for IR planning, pilot testing, and implementation. For example, one question asks respondents what percentage of the responsibility for an operational IR has been given (IMP) or should be given (PPT and PO) to various campus units. For each listed unit, respondents typed a percentage from 0% to 100% into a dialog box. Project staff averaged percentages so they added to 100% overall. Figure 1 gives preliminary results. Although PPT and PO decision-makers envision the library sharing operational responsibility for an IR, the library shoulders much of the responsibility. Decision-makers from institutions with full-fledged operational IRs choose responses that show library staff bearing much of the burden of responsibility for the IR.

Project investigators will follow up this finding in subsequent project activities such as phone interviews and case studies to determine the reasons why most of the responsibility for the operational IR eventually falls into the hands of library staff.

4.6 Recruiting IR Content

IMP questionnaires ask decision-makers how they would assess their success with nine methods of recruiting IR content. Table 5 lists the percentages of respondents who rate listed methods as "very successful" or "somewhat successful."

Table 5. Success of Content Recruitment Methods

Staff report high levels of success with working one-on-one with early adopters, giving presentations about the IR at faculty meetings, and making personal visits to potential contributors. Decision-makers are especially not positive about institution-wide mandates because this method sorts to the bottom of the list.

A follow-up question asks decision-makers who are (IMP) who they think would be (PPT and PO) the major contributors to their institution's IR. Figure 2 summarizes results.

Decision-makers at PPT and PO institutions expect that contributions by faculty will outnumber contributions by other groups. At IMP institutions, a different picture emerges Faculty are active contributors but others-graduate students, librarians, archivists, research scientists, and undergraduate students-are also making contributions.

4.7 IR Costs

Several questions address costs of IRs. For example, one question asks decision-makers what percentage of their IR's annual budget is allocated to each of 7 line items. For each listed staff type, respondents type a percentage from 0% to 100% into a dialog box. Project staff averaged percentages so they add to 100% overall. Figure 3 gives preliminary results.
At IMP institutions, costs for staff and vendor fees dominate costs. AT PPT institutions, costs for staff dominate costs.
4.8 Next Steps on the Road to an I

Decision-makers at IMP institutions are asked how long they would stick with their IR system before migrating to a new one. Up to three years is the response of 54% and four to six years is the response of 31%. Only 15% say they would stick with their current IR for more than six years. On average, respondents will stay with their current system for four years before migrating. An institution's decision to migrate may be swayed by future developments with regard to for-profit vendors that enter the IR marketplace, opportunities to participate in a consortium, and improved systems that are more versatile for handling preservation, customization, the wide range of digital formats, etc.
Decision-makers at PPT and PO institutions are asked to assess their next steps connected with IR implementation. Questionnaires listed 6 possible next steps.

Table 6 gives preliminary results.

Table 6. Next Steps for PPT and PO Institutions

Table 6

The top-ranked next step chosen by both respondent types is "Your institution supports implementation of an IR software package." Tied for the top rank amongst decision-makers at PO institutions is the step "Your institution widens the scope of its investigation into IRs." Ranked last for both respondent types is the step "Your institution terminates its investigation of IRs." Clearly, in terms of this project's respondents who are in the PPT and PO phases, the momentum is on the side of an IR effort that eventually culminates in IR implementation. Waiting to partner with other institutions or join a consortium may be in the cards for a minority of PPT and PO institutions in the MIRACLE Project census.

We ask decision-makers at institutions where there is no planning for an IR to rate the importance of 15 reasons why no such planning has taken place to date. Their top two reasons are:

Other priorities, issues, activities, etc., are more pressing than an IR, and
We have no available resources to support planning. Their bottom three reasons are:
We do not understand or believe in the value or effectiveness of an IR
We have no support from our library's administration, and
We do not need an IR

These top and bottom reasons indicate that decision-makers at NP institutions are knowledgeable about IRs, have considered them for their institution, and are probably too busy with other tasks including finding resources to support IR planning in the future. We can safely say that for our study's respondents, few have entirely dismissed the idea of IRs for their institution.

5. CONCLUSIONS

The results presented herein are based on data collected through mid-May 2006. In early June 2006, MIRACLE Project investigators will send one more set of reminders to prospective census participants and then terminate the census in mid June 2006. It may be premature for us to draw conclusions here because we are still actively collecting data. Based on the preliminary data analysis, however, we are able to identify a few prominent findings from census responses.
Overall, census respondents are positive about IRs. They anticipate a number of benefits such as the ability of the IR to capture the intellectual capital of their institution, providing long-term preservation of digital output, and providing better service to contributors specifically and their institution's learning community generally. The IR effort is a collaboration of several units including the library, CIO's office, archives, central computing, central administration, schools, and departments. Census respondents have a strong interest in learning more about successful IR implementations from other institutions. In spite of the enthusiasm of census respondents, IRs have not yet "come into their own" as the number of documents held in both pilot-test and operational IRs are very small. To increase submissions to IRs, IR staff may want to consider content recruitment methods that have been successful at other institutions. Responses regarding IR-system capabilities indicate that IR systems can be improved especially in the system features that are directly related to end-user searching such as user interface, controlled vocabulary searching, and authority control.
This paper reports preliminary census findings based on the responses of 273 respondents (13% response rate) collected to date. As MIRACLE Project staff gather more responses, we will undertake additional data analyses that shed light on the many issues, concerns, and experiences pertaining to the success of IRs in academic institutions. Publication of the final report of the MIRACLE Project census is expected in January 2007. Please consult the MIRACLE Project's web site (http://miracle.si.umich.edu/) for more up-to-date information on the census report specifically and on subsequent project activities generally.

6. ACKNOWLEDGMENTS

Our thanks to the Institute of Museum and Library Services (IMLS) who is supporting the MIRACLE Project through a National Leadership Grant (LG 06-05-0126).