Merging Metadata and Content-Based Retrieval: Deniman et al.: JoDI

Abstract

Educational digital libraries employ resource discovery systems that are aimed at providing educators and learners with curriculum materials to support learning in both formal and informal settings. The article describes a "hybrid" educational resource discovery system, which combines metadata and content-based retrieval methods. This hybrid system was implemented and evaluated in the context of the Digital Library for Earth System Education (DLESE). A pilot study was conducted to compare this hybrid system with an existing metadata-based system, with the aim of finding out if the hybrid system helps educators locate relevant resources with less effort. The results of the study suggest that the hybrid system decreased the variability in the number of user actions required to locate learning resources. The hybrid system interface featured embedded links, pointing to inner pages within a larger compound learning resource; study participants made use of these embedded links to locate individual learning objects.

1 Introduction

The past decade has witnessed the increasing ubiquity of the World Wide Web in homes and schools, the emergence of new kinds of electronic learning communities (Preece 2002), and the widespread creation and distribution of digital educational materials. Educational digital libraries have emerged as a means for disciplinary communities to share, organize and assess their intellectual holdings. In both Europe and the United States, there are many educational digital library efforts underway aimed at improving primary, secondary and undergraduate science education across a range of disciplines (Murumatsu and Agogino 1999; Borgman et al. 2000; Hoffman et al. 2001). While educational digital libraries offer a diverse array of services to educators and learners, a key service is educational resource discovery. Educational resource discovery systems are the software interfaces and information retrieval algorithms that help educators and learners to locate curricular materials and to incorporate them into classroom or informal learning settings.

This article presents an innovative hybrid educational resource discovery system that merges metadata and content-based information retrieval methods. Metadata-based retrieval methods search over library catalogs, i.e. sets of structured records that describe individual library resources. These records typically include information such as title, subject, description, resource type, audience, URL, etc., and are usually created by skilled personnel. As such it requires human effort, experience, and cost to index and describe resources using this approach. The structured nature of metadata enables retrieval methods to support fielded searching, where users can construct queries that selectively search over values in specific fields. Some fields, such as resource type and audience, are often assigned values based on controlled vocabularies. Other fields such as title and description are typically comprised of unstructured text and support subject or keyword-based searching. Usability studies of early bibliographic search systems highlighted end-user tendencies to favor keyword-based searching over fielded searching. These findings influenced the design of second generation bibliographic search systems, which support both fielded and keyword-based retrieval capabilities (Marchionini 1995).

Content-based discovery systems search across primary content, i.e. the resources themselves. These systems rely on automatic and often opaque methods of information indexing. As such, no human effort is required to describe and index library resources. Two prominent types of content-based retrieval models are the probabilistic retrieval and the vector space models (Salton 1986, Callan et al. 1992, Furnas et al. 1988). Probabilistic models rank documents based on the probability of their relevance to the user's query, often relying on term weighting schemes to estimate relevance (Salton 1986, Berger and Lafferty 1999). In his influential work, Salton (1986) argued that automated text retrieval systems that analyzed the full text of stored documents performed as well as systems relying on manual indexing. Other efforts have explored automatic citation analysis as a means of estimating relevance, some exploiting the link structures in hypertext (Brin and Page 1998) and others analyzing the relationships between bibliographic citations in scholarly documents (Giles et al. 1998). The last decade has witnessed rapid developments in content-based retrieval methods, both in commercial search engines (Sullivan 2003; Brin and Page 1998) and in research systems (Wang and Du 2001).

Both retrieval methods have their strengths and weaknesses. Prior research suggests that strategies combining manual and automatic indexing can be more effective in terms of improving the precision of query results (Callan et al. 1993, Adriani and Croft 1997). Our research complements this prior work by considering how these methods can be fruitfully blended to serve the specific discovery needs of educators and learners. The remainder of this article begins with a description of our research context and design methodology, and then discusses three design considerations for educational discovery systems that were identified through a series of formative evaluations with educators. The core of this article presents a hybrid discovery system based on a merged retrieval method, and the results of a pilot usability study comparing this system with an existing metadata-based system.

2 Design Methodology and Considerations

This research is being conducted in the context of the Digital Library for Earth System Education (DLESE). A key objective of DLESE is to provide searchable access to high-quality, online educational resources for primary, secondary and undergraduate earth system science education (Marlino et al. 2001). These heterogeneous resources include objects such as maps, images, simulations, lesson plans, lab exercises, data sets, virtual field trips, and interactive demonstrations. These resources are created by a wide variety of individual faculty members, agencies and institutions. Collections are distributed in that individual resources are held by contributors on their local servers and are currently accessed through the library via a database of searchable metadata records that describe them.

The DLESE project is committed to engaging in a user-centered design process that encourages broad-based participation and is focused on addressing concrete user needs. Towards this end, we have adopted a task-centered design methodology consistent with best practices in designing highly interactive systems. The central tenets of this methodology are: early and continual involvement of users in the design process, a focus on real users and their tasks, and iterative design guided by frequent formative evaluations (Gould et al. 1991; Lewis and Rieman 1993). To date, we have conducted numerous formative evaluations as part of this overall methodology to help us understand user needs and to evaluate different versions of our educational discovery system (Davis and Dawe 2001; Sumner and Dawe 2001). Our evaluation activities have included:

Interviews with 12 current and potential library users;
Collaborative use case development with 10 educators;
Focus group sessions with 23 master primary and secondary school (K-12) science teachers;
Relevancy analyses of search results using a test suite of 20 queries;
Three rounds of formal usability studies involving 15 participants.

When considered together, the results from these studies provide a convergent account of the major challenges educators and learners face in locating and using digital educational resources.

2.1 Lack of confidence and expertise

Educators prefer trusted sources where they can be reasonably sure that the curricular materials are scientifically accurate and pedagogically sound. In the United States, many K-12 science educators are teaching out-of-area (NCES 1999), i.e. they have no formal educational background in the scientific field they are teaching and thus lack confidence in their ability to evaluate the quality of science education resources. This lack of confidence was even evident in the master teachers involved in our focus groups who were officially recognized by their school district for their science education leadership.

This leads to several implications for resource discovery systems. First, indexing systems should support and reflect the library's selection criteria, i.e. the guidelines for deciding what resources can be accessioned into a given collection. This can be tricky in content-based retrieval methods where indexing is often the result of unguided traversal of Web page links. Second, discovery system interfaces should be designed to support resource comprehension, i.e. to help educators understand and evaluate resources returned by the system. Our formal usability studies found that the uniform descriptions provided by metadata records can play an important role in supporting resource comprehension. The design of digital learning objects is incredibly diverse. The huge variety in both presentation and structure of these objects creates a steep learning curve where considerable time can be required to comprehend a resource in order to determine if it is indeed relevant or not.

2.2 Two distinct types of planning

Through interviewing and use case development, it emerged that both K-12 and undergraduate educators engage in two distinct types of curriculum planning which we refer to as "prepare for a lesson" and "prepare for a course". For K-12 educators, the distinctions between these two types of planning activities are particularly sharp. Preparing for a course is a long-term planning activity that occurs over days or even weeks. For K-12 teachers, this is often performed in advance of the start of the school year over summer break. Many of our interview participants report planning in teams with other teachers at their school. Resource discovery in this context is exploratory and open-ended in nature: teachers are looking for interesting and potentially relevant resources within a broad topical area. Relevant resources in this context are diverse, including learning objects appropriate to their student population (such as visualizations, simulations, lab activities, texts), curricular guideline materials (e.g. lesson plans or field trip guides), and professional development materials (such as sites providing conceptual overviews to help with their own understanding).

Daily planning activities often involve preparing for a particular lesson or learning activity that will occur in the next few hours or the next day. K-12 teachers report having between 10 minutes to 2 hours for daily planning, with 30 minutes being typical. Resource discovery in this context is targeted and defined: teachers are looking for specific learning objects (e.g. a lab activity or simulation) to slot into an existing curriculum where the topic and the time available for the lesson are already defined. Teachers report wanting small, very precise search result sets in this context, such as the best three to five resources.

2.3 Granularity Mismatches

All resources in DLESE are Web-based, but vary widely in their granularity: some are single learning objects (such as an image or simulation); others are entire Web sites or online courses consisting of numerous learning objects distributed across hundreds or even thousands of Web pages. Our formal usability studies identified a granularity mismatch problem in the existing version of our educational discovery system. As discussed above, educators are often looking for individual learning objects, such as a lesson plan or simulation. However, the discovery system often returns search results containing pointers to top-level pages of resources consisting of multiple learning objects. The lesson plan that the user is seeking may be several clicks away from this top-level page. This mismatch is particularly noticeable when the metadata description clearly notes that the resource contains learning objects of a particular type and these are not readily visible from the top page. Conversely, if each individual learning object in a large site is indexed, this can lead to search results pages with too much redundancy, i.e. many of the results could be similar resources from the same content provider.

3 The Hybrid Discovery System

Discovery systems must support four chief tasks: indexing of source texts, searching across indexes, compiling search results, and displaying results to the end-user. Building a hybrid discovery system requires a detailed understanding of the strengths and weaknesses of both metadata and content-based approaches in each of these four areas. Towards this end, two independent discovery systems were built and analyzed (Dawe and Deniman 2001), and the results of these analyses were used to inform the design of the hybrid system discussed in this section. To set the stage for this discussion, we first describe the open source search engine that underpins all three systems, and briefly present the metadata-based implementation. The results of a pilot study comparing the hybrid and the metadata-based systems are presented in the following subsection.

3.1 Search Engine

We are using the Java-based open-source Lucene search engine implementation as part of our discovery system research substrate (Goetz 2002). Lucene was selected because it provides a powerful, flexible and scalable architecture that enables library developers to implement custom indexing policies. Its application program interface supports rapid prototyping and experimentation across a range of indexing and search strategies. Central to Lucene's design is the notion of "documents". For every source to be indexed, the appropriate parser is used to create a Lucene document of searchable fields, where each field is constructed using a name-value pair. Lucene documents allow indexed text to be stored for easy retrieval and display, without having to retrieve the source from a secondary store.

Lucene supports indexing policy with the use of analyzer objects. Analyzer objects are applied when indexing Lucene documents and when parsing user queries. An analyzer object implements one or more tokenizers, which in turn apply appropriate filters to source text in order to create tokens for indexing. Lucene provides standard filters to support, among others, case-insensitive searches and the exclusion of stop words (i.e. common words to be excluded from the index). A library developer may optionally implement custom tokenizers, and creatively combine tokenizers using custom analyzers.

When indexing, Lucene maintains document term counts and position information, i.e. the frequency and location of indexed terms within a particular document. Lucene uses this information to determine relevancy of query terms and for results ranking, i.e. the order that search results are presented to end-users. For this article, references to keyword relevancy indicate the usage of this scoring strategy. The implemented discovery systems make use of two separate Lucene indexes, one for metadata and one for content. A metadata record ID is stored with every indexed document in both indexes, so that results from one index can be paired with results from the other.

3.2 Metadata-Based System

DLESE's metadata framework is based on the IEEE Learning Object Metadata framework (LTSC 2001). Some fields in this framework consist of free form text (e.g. 'description' is a short narrative summary), while values of other fields are based on selections from controlled vocabularies (e.g. 'subject' is chosen from a carefully crafted list). Metadata records are created by skilled personnel and guided by library policies. While considerable effort, experience and cost are required to create DLESE metadata records, a perceived benefit is collection quality: resources described by metadata records conform to library cataloging policy and the metadata records have been through a quality assurance process to promote accurate and consistent descriptions.

Metadata records are stored in well-formed, valid XML files that reside on local servers. These consistently structured files are easily parsed using standard XML parsers. Each record is guaranteed to be identifiable by a unique ID, and to map uniquely to a specific URL. In DLESE, however, a resource with varied content may be difficult to characterize adequately in a single metadata record. DLESE cataloging policy advocates using a single record unless parts of the resource differ substantially in technical requirements, description or educational data. Thus, multiple records may exist for a single Web domain.

Our metadata-based discovery implementation indexes and queries different fields within the metadata separately. The indexed fields include title, description, subject, grade level, resource type, resource creator, and others. Early user testing revealed that educators often search for earth system science concepts using multiple terms, e.g. acid rain, Lake Tahoe, plate tectonics, earthquake lesson plan, without making use of phrase or Boolean searching features. At the same time, they often expressed strong preferences for smaller, precise search result sets. To accommodate these findings, we decided to logically AND the search terms entered by users, as the default to emphasize precision. To promote recall, we elected to perform a boolean OR across the title and description fields. Thus, this implementation attempts to strike a careful balance between precision and recall. Phrase and Boolean search features are also supported.

The display of search results is fairly conventional, including the title, the resource URL, the first 200 characters of the description, and a link to view the full metadata record. Results are ordered by Lucene's determination of query keyword relevancy, and because DLESE metadata is expertly constructed, our studies suggest that there is a strong correlation between the matched query terms and semantic content of the resources (Dawe and Deniman 2001). Overall, the metadata-based system favors precision, but can fail to locate resources for queries where the terms do not appear in the indexed fields.

3.3 Combining Metadata and Resource Content

Content-based retrieval methods search across the actual resources using automatic processes. A perceived benefit of automatic processing is scalability (in that minimal human mediation is required), yet a difficulty exists in the heterogeneity of object types and media with which automated processes must contend (Brin and Page 1998). Heterogeneity is certainly an issue in DLESE, as earth science resources include many non-textual objects such as maps, images, simulations and data sets.

Content-based retrieval is potentially useful for addressing the granularity mismatch issue, since individual learning objects embedded in a resource can be indexed. To leverage the strengths of both metadata and content-based retrieval methods, our design considers merging across the four chief task areas required of discovery systems. These are summarized in Table 1, along with the relationships observed between query terms and returned results.

**Table 1. Characteristic representation of DLESE prototype discovery systems**
	Metadata Discovery	Hybrid Discovery
Indexing	Words in text of descriptive metadata fields - mediated terms validated by experts - normalized and consistent textual data easily parsed	Words in HTML pages - subjective terms specific to author's intent, contextual - heterogeneous objects with inconsistent text structures, difficult to parse
Searching	Query keywords located in title and description fields of descriptive record	Query keywords located in title, HTML header tags, links and body text of individual pages
Matched Query Terms	High semantic relevance to principal content of resource, does not require term to be present in resource	Significant semantic relevance only to individual pages of content, does not require term to be present in metadata
Results Compilation	Straightforward ordering based on search engine ranking of terms in title and description fields of metadata	Modification of search engine standard results by grouping pages with their related metadata records, ranked by order of the pages in search engine result list
Display	Customary format with brief display of metadata	Hierarchical format with links to resource pages framed inside brief metadata record display

The hybrid implementation first requires that resource content be indexed. In contrast to fully automated content-based systems, we take advantage of DLESE metadata to target resources for indexing, i.e. the resource URL is retrieved from each metadata record and passed to the content-based system crawler for indexing. The system uses the resource URL as a control to index only pages that are within the resource, i.e. share the same root URL. This design decision is motivated by a desire to create a system that supports stated library collections policy, i.e. to ensure to a reasonable degree that the results returned from the content-based search have passed the library selection criteria and been formally accessioned into the library for discovery by users.

Variability in the structure, type and scope of the resources presents more difficulties for indexing than does the uniform structure of metadata records. While the resource URL retrieved from the metadata record targets the initial content page, hyperlinks to included pages must be programmatically identified and traversed. Web-based resources can contain hundreds or thousands of pages and their hyperlinks may point to many different object types, not all of which contain parsable text. For our implementation, indexing of resource content is restricted to a maximum of one thousand successfully parsed HTML pages per resource. This number was chosen based on empirical experimentation to provide a balance between coverage and indexing speed; it is large enough that most library resources are completely indexed, leaving only a few very large sites partially indexed. Each page equates to a single indexed document, and is related to the appropriate metadata record by the stored metadata record ID.

As with the metadata-based system, separate indexing fields are constructed. For each page of successfully indexed content, the fields include: the HTML title attribute, HTML header text (H1, H2, etc.), HREF link text, and the HTML body text. As with default searches in the metadata-based system, the users' search terms are logically ANDed to emphasize precision; and a boolean OR search is performed across the fields to promote recall.

The most significant challenge to building a hybrid discovery system is the compilation of search results for display. The same query, when performed on the two disparate indexes, produces distinctly different result sets. To order the displayed results list, there are several design options to consider. The result set from either index could be used to guide compilation or an aggregate set based on both indexes could be compiled. Using a single result set restricts the displayed list to only those records contained within that set. On the other hand, aggregating the two implicates the relevancy ranking of each and presents challenges for resolving a highest to lowest ordering. Currently, we base ordering on the content-based system's result set.

For the display of search results, a two-level hierarchical representation was conceived in which a resource's list of pages resulting from the content-based search is 'framed' within the related metadata record (Figure 1). The metadata displayed is essentially the same as in the metadata-only system, but a frame has been constructed to make the record appear as a coherent unit wherein the important contextual information provided by the metadata brief description is preserved. Within the frame, below the brief description, users are informed as to the number of pages within the resource satisfying their query. The top five pages are shown as "embedded links", i.e. the HTML title attribute is constructed as a hyperlink to the actual Web page.

Figure 1. Display of search results obtained in the DLESE hybrid discovery system. Each metadata record 'frame' creates a coherent context for the resource pages returned from a content-based search

To construct the display, we traverse the content-based result set, starting with the highest ranked result, and assign each one to the appropriate metadata record using the stored ID. If the current result is the first 'hit' for a particular metadata record, then the ID of that record is appended to a 'hit list'. Each record is added to the 'hit list' only when it is first encountered, assuring that it is displayed just once and in the same relative order as it appears it the content-based result set. This creates an ordered list for displaying the framed records. A separate 'page list' for the hit record is then created, with the current result as the first page added to the list. As the content-based result list is traversed, each subsequent result is appended to the appropriate 'page list'. An example illustration of applying this algorithm is shown in Figure 2.

Figure 2. Algorithm used to compile results for display in DLESE hybrid discovery system. For clarity, lines are drawn for only two individual result pages. Note that the individual pages appear as hyperlinks in the final display, and are described by the HTML title attribute of the page

Our earlier studies demonstrate that content-based systems tend to favor recall over precision, and can locate resources metadata-based systems cannot. The same studies, however, indicate that content-based systems can return many similar or redundant results (Dawe and Deniman 2001). A potential advantage of framing content-based results within corresponding metadata records is the reduction in redundancy. That is, a resource with many highly ranked pages will appear only once in the results display, providing opportunity for another resource to be displayed among the top results.

4 Pilot Study

We conducted a pilot study to get a preliminary understanding of how users' experiences compared across the two systems. Although the DLESE discovery systems are aimed at both educators and learners, this study was performed only with educators as participants. In this pilot study, we wanted to examine whether the hybrid discovery system, with its content-based indexing and 'framed display', helped to alleviate the granularity mismatch problem. Do the embedded links to internal parts of educational resources help educators to locate individual learning objects? Furthermore, given that the hybrid system consistently produces larger numbers of search results, we wanted to assess whether the system had adversely shifted the precision-recall balance from the users' perspective. That is, do educators have to expend more effort to wade through the search results in order to find resources they believe to be relevant? Given the small number of pilot study participants, the results presented here should be viewed as descriptive in nature, rather than generalizable to other contexts. As such, these results can best be used to inform the development of targeted research questions appropriate for future studies with larger numbers of participants.

4.1 Procedure

Eight participants volunteered to take part in the study. All were science educators in primary, secondary, or undergraduate settings. Four participants used the metadata-based system and four used the hybrid discovery system. A short questionnaire prior to the test was used to eliminate volunteers without any prior computer or Internet experience and to distribute the remaining participants evenly across the two test conditions, according to their computer and Internet experience. Participants in both conditions reported similar high levels of familiarity with environmental science, the general topic of the two experimental tasks.

Each testing session took approximately 45 minutes to one hour. Participants were first shown a demonstration of the discovery system by a test facilitator, and were then asked to complete a short training task. During the experimental phase, each participant completed the two tasks shown in Table 2. The tasks were designed to reflect the two types of curriculum planning activities important to educational users. Participants were asked to bookmark the resources that they chose for each task. The order of the tasks was reversed across participants.

**Table 2.** **Two tasks used in the pilot study**
Prepare for a Lesson	Prepare for a Course
Pretend you are a high school teacher preparing for a lecture tomorrow on the earth's climate. This is not your normal teaching assignment but you are filling in for a colleague. You have a 30-person class of restless 10th graders and need an active, hands-on activity to keep them engaged and interested. You have 7 minutes of free time until your next faculty meeting and you want to use the DLESE discovery system to locate a good learning activity suitable for 10th graders that will help them to understand some aspect of the earth's climate. You will bookmark the site that you choose.	Pretend you are a high school teacher, doing some preliminary planning for a new 6-week module you would like to teach on environmental science for both 11th and 12th graders. You have 20 minutes of free time till your next faculty meeting and you want to use the DLESE discovery system to locate five resources that will help you plan this course. Specifically, you are looking for one lesson plan or syllabus, and four sites that provide a good comprehensive overview of environmental science. Your goal is to find five promising sites or resources that you will examine in more detail later. You will bookmark the five sites that you choose.

To shed light on the granularity and effort questions, we devised a structured observation procedure to track:

the overall effort, in terms of number of interface actions, involved in locating a resource believed to be relevant
the distribution of actions across the discovery system and the resource itself.

This procedure, adapted from one developed by Armento et al. (1999), assumes that a well-defined discovery system reduces the overall number of actions required to locate a relevant resource, and in particular reduces the number of actions that take place in the resource itself. We elected first to conduct this exploratory pilot study with a small number of participants, to assess and refine this procedure, before attempting to scale to larger numbers of participants.

During each test session, a test facilitator looked over the shoulder of the participant and kept an activity log of the participant's actions using a specially prepared activity log worksheet. Using the worksheet, facilitators tracked the participant's search patterns by:

Noting the search terms the participant used to look for resources
Noting when and where bookmarks were made
Keeping track of the participant's progress through the discovery system and resource pages by ticking appropriate columns in the worksheet every time the participant went to DLESE pages (First Search Results Page, Additional Search Results Pages, and Full Metadata Description pages) or to pages in the resource (we distinguished between 'top' pages or 'sub' pages).

All test facilitators were trained in the procedure prior to the experiment to ensure consistent observations and data collection. After each participant completed the two tasks, they filled out a short questionnaire on their subjective experience with the discovery system, i.e. how they perceived the quality of their search results and the resources they bookmarked. They were also asked to rate the usefulness of various parts of the discovery interface for comprehending and evaluating library resources.

4.2 Results

All eight participants completed both tasks, with five of the eight participants electing to bookmark more than the required number of resources during the 'prepare for a course' task. Overall, the participants did not appear to be rushed, despite the imposed time limits for completing the assigned tasks. Table 3 shows the total number of resources chosen for each task in both experimental conditions. The numbers in parentheses indicate how often the resource selected as relevant was a sub-page of a resource, instead of a top page. Consistent with the design considerations outlined earlier, participants in this study selected individual learning objects located within larger resources most of the time in the 'Lesson' condition, and approximately half the time in the 'Course' condition.

**Table 3. Number of resources chosen for each treatment, with number of decisions made at resource subpages in parentheses**
	Prepare for Lesson		Prepare for Course
Number of resources bookmarked	Metadata	Hybrid	Metadata	Hybrid
Number of resources bookmarked	6 (5)	4 (3)	25 (12)	20 (9)

We looked for overlap between the resources selected by the participants. There was no overlap in the resources selected by participants using the Metadata system in either task condition. Four of the 20 resources selected by participants using the content-based system to complete the 'prepare for a course' task shared two root URLs, i.e. participants selected different parts of two large resources to bookmark. This lack of overlap could be explained in part by the variations in search terms used. For instance, some participants interpreted the environmental science task narrowly, depending on their personal interests and backgrounds, and searched for subtopics like streams, watersheds, California water, Great Lakes water levels, etc. Of the 52 total queries made by pilot participants, 29 (56%) were purely subject-based; the remaining 23 queries were subject-based with either resource type or grade-level modifiers (e.g. climate 9 -12, geology lesson plan).

We also examined the overall effort to locate a resource believed to be relevant in each of these quadrants. For both tasks, we calculated the average number of actions executed before a resource was bookmarked, and the variance in the number of actions required for each bookmark. Executing a query, making a bookmark, and any interface action that resulted in a page request (e.g. clicking a link, pressing the Back button) counted as one action. As shown in Tables 4 and 5, there were no discernible differences in the average number of actions required across the two types of discovery systems for either task. However, for both tasks the effort to locate a resource using the metadata-based system exhibited greater variability (e.g. Metadata: 4.17 Std Dev., Hybrid: 2.1 Std. Dev. for "Course').

**Table 4. Actions required for a bookmark during the lesson activity**
Prepare for Lesson	Mean	Median	Std. Dev.
Metadata	6.83	6.5	4.4
Hybrid	6.5	6.5	2.38

**Table 5. Actions required for a bookmark during the course activity**
Prepare for Lesson	Mean	Median	Std. Dev.
Metadata	5.15	3.5	4.17
Hybrid	4.25	4.0	2.10

To understand this variability in effort better, we computed frequency histograms for both tasks to make visible the distribution in the number of actions required for each bookmark. As shown in Figure 3, in the course preparation task, the metadata-based system exhibited less consistent results. Approximately 50% of the time, participants found a resource believed to be relevant in less than three actions; the other 50% of the time required anywhere from 4 to 18 actions. Conversely, the effort to locate resources using the content-based system exhibited greater consistency. Participants required 4-6 actions to locate a resource about 50% of the time, with the remainder of the data being closely grouped around this interval. The frequency histogram for the lesson preparation task was similar.

Figure 3. Frequency histogram showing actions per bookmark for Course task

We believed that the hybrid system would help alleviate the granularity mismatch by:

indexing the content of the resource
providing links in to the interior of the resource where individual learning objects are more likely to be found.

If this is true, the hybrid system should require fewer participant actions in the resource. To examine this, we broke the participant's actions into actions originating in DLESE, and actions originating in the resources. As we can see in Table 6, for both tasks the average number of resource actions was less than the average number of DLESE actions for the hybrid system (e.g. for the Lesson task, Metadata: 3.5 resource actions, Hybrid: 2.75 resource actions). This suggests that perhaps the hybrid system was effective at pointing users towards individual learning objects.

**Table 6. Effort spent in DLESE against effort spent in Resource**
Average Number of Actions	Prepare for Lesson		Prepare for Course
Average Number of Actions	Metadata	Hybrid	Metadata	Hyrbid
In DLESE	3.33	3.75	2.34	2.15
In the Resource	3.5	2.75	2.8	2.10

We then examined the use of the embedded links, which appeared in the framed display when using the hybrid system. Our vision for these links was that they would be used more when performing focused searches rather than when performing less defined searches. Indeed, as Table 7 shows, the links were used 100% of the time to complete the Lesson task and only 20% of the time to complete the Course task. This suggests that without explicit instructions, participants were able to identify the design purpose, or affordance, of these links and used them appropriately.

**Table 7. Use of embedded links**
	Prepare for Lesson	Prepare for Course
Bookmarks where Links Used	4	4
Total Bookmarks	4	20
%	100%	20%

Even though the average number of DLESE actions looks numerically the same in both systems (Table 6), the nature of these actions is quite different in each. Many of the actions in the metadata system were searches rather than exploration of the results as 28.6% of the searches using the metadata system returned zero search results. This is in stark comparison to the hybrid system, which had no zero result searches.

We examined the subjective experience of the participants with the two discovery systems by analysing the post-task questionnaires. Participants were asked a series of questions about the perceived quality of the search results, the resource design, and the resource content. The format used a four-point scale ranging from 'not satisfied' to 'very satisfied'. Keeping in mind the small number of participants in the pilot study, the analysis revealed two differences in the way participants perceived the two systems.

Previous interviews suggested that many educators would favor smaller, more precise results sets, such as the best five resources on a particular topic. One of our concerns with the content-based system was that the larger number of results might be overwhelming or perceived to be less useful. This appears not to be the case for participants in this study. Participants using the metadata-based system reported that the number of results returned were too few; while all the participants using the content-based system were satisfied with the larger number of search results that they received per query. One possible explanation is that users of the metadata-based system may have been disappointed with the number of null result sets, which negatively influenced their overall perception of the value of the search results.

Another of our concerns with the content-based system was that users would perceive the quality of resources to be lower as a result of being taken directly into the interior of large resources, which tend to be less clearly labeled and sometimes lacking in the clear navigation and orienting design features of top pages. Previous interviews with educators suggest that quality is an important factor influencing the use of DLESE: educators prefer trusted sources that they believe to contain high quality educational resources. This study's results are somewhat mixed on this quality perception question. For instance, participants using the metadata-based system uniformly reported being satisfied with the resources they found, while the satisfaction with resources of participants using the content-based system varied from somewhat satisfied to satisfied. However, participants using both discovery systems responded similarly to questions about the perceived quality of resources. They agreed that the resources were well designed, and that they were not difficult to understand. They also unanimously agreed that the scientific and educational quality of resources returned by the two systems was very good, and that they trusted DLESE to contain quality resources. Overall, the similarity of opinion across the two groups of participants on these questions is good news, suggesting that the two discovery systems maintained a similar baseline with respect to the perceived quality of resources catalogued by DLESE.

5 Discussion

The results of the pilot study suggest that the hybrid system helped decrease the variability in finding relevant resources, providing a more consistent and predictable user experience. This could be beneficial for educators preparing a lesson: to make use of digital libraries in this context, they need to be reasonably sure they can find useful resources within their tight time constraints. The results also suggest that the hybrid system was more effective at pointing educators toward individual learning objects. There was also an indication that within the hybrid system, educators spent more time exploring search results, whereas in the metadata system, they spent more time conducting searches. Thus, it can be argued that while the effort to locate relevant resources was similar across the two systems, the hybrid system helped users to spend their time more productively.

Clearly one limitation of the results reported here is the small number of participants in the pilot study. With only four participants for each treatment, individual differences in information seeking patterns and system use could sway the results. However, we first performed the analyses per participant to check for significant individual variations in task performance and none were discerned. As part of our iterative design methodology, it is our intention to scale this study to larger numbers of participants in order to develop robust baseline performance measures that can be used to compare and assess future generations of DLESE discovery systems. A necessary first step for scaling up will be instrumenting the discovery system research substrate to automatically collect and collate the requisite data, based on the categories developed for the activity log worksheets.

We believe that a key strength of the hybrid system is the way it uses the human-crafted metadata records to contextualize search results, which should assist learners and educators to comprehend and evaluate digital learning resources. This contextualization is presented to the user in the search results using a framed display. This display approach was inspired, in part, from the hierarchical format used in the Cha-Cha system (Chen et al. 1999). A key difference is that Cha-Cha's aim is to reflect the underlying link structure within the resource, whereas our aim is to reflect the semantic content as it pertains to the user's query. Web page titles form the embedded links featured in the framed display. As such, the utility of this display depends to a significant degree on the quality of the titles assigned by Web page creators. However, many creators do not assign semantically meaningful titles to each page of a large resource. Specifically, we have noticed redundancy in page titles within many large resources, i.e. the same title is used for multiple pages. This naming practice reduces the usefulness of the framed display for helping users identify individual learning objects within a large site. Part of our future work will include an analysis of page title redudancy, as well as missing titles, to determine the impact of these practices on the framed display approach.

A significant enabling factor for this hybrid approach is the synergy between the affordances of the algorithm and library policies. Selection filters specified in the DLESE collection's policy contribute to the effectiveness of the hybrid discovery system by enabling the targeted indexing to focus on quality assured content. DLESE cataloging policy seeks to strike a delicate balance between adequately describing large resources and avoiding redundant search results from the same content provider. Multiple metadata records are generated only when there are significant differences within a resource with respect to educational concerns and uses, in effect separating large resources into meaningful semantic units. These units guide the discovery system, and thus enable the algorithm to group fine-grained entry points into a meaningful context.

Clearly, this hybrid approach could also be applied and extended to other contexts without these policy restrictions. For instance, suspending the requirement that indexed resources shared a root URL would enable the algorithm to traverse off site to feature all 'related resources', rather than just embedded resources. An alternative display could be constructed that highlighted the relationships between different sites, instead of nesting of learning objects within a site. Such as display might promote closely interleaved searching and browsing behaviors. This extension would also require careful re-consideration of the traveral strategy and indexing cut-off point to decide which pages, and how many, of large related sites should be indexed.

One known limitation of this hybrid implementation is its treatment of non-textual resources such as images, data, etc., which are retrieved using metadata-based indices. The implementation described in this article uses the content-based system's result set to guide the process of compiling search results for display to the user. As such, resources retrieved as part of the content-based search are featured above resources retrieved as part of the metadata-based search, thus placing non-textual resources at a disadvantage. Given the importance of non-textual resources to the DLESE community, future work will need to explore alternative results compilation methods.

6 Summary

This article frames three design considerations for educational discovery systems. Specifically, discovery systems need to:

Support resource comprehension, that is, educators should be able to decide quickly and with less effort, whether a resource is relevant or not
Strike a delicate precision-recall balance, so that educators can find appropriate resources while performing both broadly or narrowly focused searches
Effectively address the granularity mismatch problem - wherein resources returned by search results can either be individual learning objects or entire Web sites consisting of numerous learning objects within them.

Understanding these design considerations, we proposed a hybrid discovery system based on a merged retrieval method. The hybrid discovery system combines the two indexing mechanisms of content and metadata discovery systems, and presents a modified search results page that hierarchically groups links to individual resource pages with descriptive information from their related metadata record in a framed display.

Merging Metadata and Content-Based Retrieval