MatDL: Integrating Digital Libraries into Scientific Practice

Laura M. Bartolo, Cathy S. Lowe, Louis Z. Feng* and Brook Patten**
Materials Informatics Research Laboratory, College of Arts and Sciences, Kent State University, Kent, OH 44242, USA
Email: {lbartolo/ clowe}
* Computer Science Department, University of California, Davis, CA 95616, USA
**Computer Science Department, University of Cincinnati, Cincinnati, OH 45221, USA


Digital repositories can be catalysts for new knowledge by providing information space and tools to facilitate the work of students, educators, or scientists. The NSF NSDL Materials Digital Library (MatDL) is adapting existing open source "tools", such as an image gallery and a version control system, to meet the needs of users within the materials science community. The tools are being modified to make submission to MatDL an easy step within a user's existing workflow and to avoid redundant effort. These satellite services provided by MatDL are intended to become an integral part of the user's "laboratory or workspace". The paper investigates whether digital repositories can expand their communities and collections by building tools that integrate a digital repository into researchers' workspaces. In the long term, it is anticipated that making submissions to MatDL an easy part of users' regular workflow will increase the likelihood that users will submit resources to the repository. Ultimately, the goal of integrating a repository into users' workspaces is to enhance the impact between research and education. Initial experience of providing these tools and responding to user feedback through MatDL is discussed.

Keywords: materials science, digital libraries, user workflow, open source tools, participant involvement

1 Introduction

New approaches in materials science are enabling scientists to design materials, assembling molecules from the bottom up. Microstructures of these materials can provide visual data to a student or researcher, helping him or her gain ideas to design structures with specific properties and functionalities. However, how does the scientist generating the microstructures describe and attach the necessary information so that the student can find the picture and understand how the structure was assembled? How does the materials science educator bring together teaching resources to train the next generation of scientists about new ways to conduct research?

The mission of the Materials Digital Library (MatDL), as part of the National Science Foundation (NSF) National Science Digital Library (NSDL) program, is to investigate research issues associated with the effective delivery of materials science information. Current areas of investigation include:

One anticipated outcome of the investigation is to reduce the transition delay of new knowledge between laboratory and classroom for the materials science community.

This paper describes the MatDL project, its current community of users, as well as approaches for encouraging participant involvement. One specific strategy for increasing user participation is discussed. Placing MatDL's domain customized submission mechanisms into users' workflow is anticipated to increase contributions and promote integration of MatDL into the materials science community.

2 MatDL project

One goal of the MatDL project is to help users gain knowledge about the range of strategies, methods, capabilities, and limitations in developing, modeling, and using materials. Currently, MatDL (see Figure 1) is a collaborative effort of the Materials Science and Engineering Laboratory at the National Institute of Standards and Technology (MSEL/NIST), Kent State University (KSU), Massachusetts Institute of Technology (MIT), University of Michigan (U-M), and the University of Colorado at Boulder (CU-Boulder).

Diagram of MatDL

Figure 1. MatDL project collaborators

To maximize opportunities to create and disseminate rich scientific content to students, educators, and researchers, MatDL adopts a multi-faceted approach:

KSU and CU-Boulder are constructing a digital repository with resources and services intended to meet the needs and interests of the materials community. Prior to MatDL's official launch, MSEL/NIST, U-M, and MIT are acting as a "kernel community" by submitting initial content, using the repository, and providing feedback. The participation of these institutions, as recognized leaders in the materials community, promotes trust and establishes a foundation for future growth of MatDL within the materials community.

3 User community

MatDL was established to support the work of the materials community, a sizable national and international constituency. A focus of MatDL is soft matter research which generates and consumes novel, complex information. Innovation in materials science impacts a wide range of industries in the global economy and holds promise for revolutionary discoveries in nano- and biotechnology.

The long-term target community of MatDL is materials science and engineering undergraduate and graduate students, educators, and researchers. Materials are building blocks for other goods, so material scientists are widely distributed among manufacturing and service industries, such as aerospace, computer and electronic, defense, medical and pharmaceutical, metal and machinery, transportation, Federal and State governments, as well as professional, scientific, and technical services (Bureau of Labor Statistics 2004). Chemists and material scientists are occupations with strong growth for education, employment, and earnings. Recent advances in nano- and biotechnology have underscored how materials science in areas such as medicine and pharmaceutical manufacturing is crucial to quality of life, national defense, and economic competitiveness.

Given the multidisciplinary nature of materials science, the interests and needs of the materials community resonate across its cognate domains. In addition to the materials community, a centralized repository of scientific visualization, codes, research results, homework assignments, and student work focusing on materials would be of benefit to biological sciences, chemistry, mathematics, and physics.

The use of digital libraries and related services within the scientific community is a relatively new development with the potential to change and improve communication and collaboration for researchers, educators, and students. One anticipated outcome of the investigation in the MatDL project is to reduce the time it takes for new knowledge generated in the laboratory to reach the classroom for the materials science community. It may also encourage new collaborations both within and outside this community. The potential for new technologies and the conventions that develop around those technologies to encourage new collaboration between separate but related groups within and between scientific disciplines has been noted. One example comes from the area of brain imaging (Beaulieu 2002). The development of functional imaging and the Talairach system, a digital space in which both structure and function can be measured, has made possible new collaborations between the scientists who study the mind and those who study the brain. While the application of the new technology was an important factor in bringing these groups closer together, the development of the Talairach system enabled scientists in historically distinct disciplines such as psychology and neurology to establish the common ground necessary for meaningful collaboration.

3.1 Kernel community and participation

The MatDL kernel group of NIST, MIT, and U-M represents a microcosm of the education, research, and government sectors within the materials community. The members of the kernel group provide initial feedback about the effectiveness of MatDL based on their use and submissions to the developing repository. Like the materials world, this group is multidisciplinary in approach and multi-institutional in composition.

Currently MatDL provides tools and archives the resulting work for the following active participants:

3.1.1 MSE courses

Supporting the education of the next generation of materials scientists is fundamental to the long-term advancement of science. The MSE departments at MIT and the U-M are highly ranked undergraduate and graduate programs, playing pivotal roles in the network of higher education materials science programs. While the three "real world" example courses of the kernel community are taught at specific universities, they characterize the range of courses taught at undergraduate private and public institutions throughout the country:

Introduction to Solid State Chemistry

is a large freshman general chemistry course taught at MIT, designed to give students the basics of chemistry that will prepare them for engineering, science, architecture, management, and the humanities -- but within the framework of an analytical education. This course teaches elementary principles of chemistry and shows how they apply in describing the behavior of the solid state. The relationship between electronic structure, chemical bonding, and crystal structure is developed. Attention is given to characterization of atomic and molecular arrangements in crystalline and amorphous solids: metals, ceramics, semiconductors and polymers (including proteins). The course does not include a lab component which would serve to teach:

An open research question is to what extent new technology, such as digital repositories, can address the absence of a traditional laboratory experience. Furnishing students with representative data would place them in a position to proceed with data analysis and report writing without conducting a traditional experiment. MatDL is working with students and faculty in the course to help them gain access to real data so that they can test hypotheses, analyze methods, and contribute their findings in oral, written, and multimedia formats back to MatDL.

Transport Phenomena in Materials Engineering combines a rigorous introduction to fluid dynamics (including Navier-Stokes, the Bernoulli equation turbulence and boundary layer theory) with heat and mass transfer (including radiation, convection, estimation of heat and mass transfer coefficients). In addition, it introduces students to numerous examples of materials processing and performance, from vacuum arc remelting of titanium and nickel superalloys, to polymer extrusion, to thin film deposition, to mixed-limitation transport through encapsulated liposomes for long-term drug delivery. Students also must review material from differential equations and are exposed to methodologies for solving partial differential equations.

Computational Nanoscience of Soft Materials taught at U-M, is a graduate course that introduces students to cutting-edge research using materials simulation methods in the context of building nanomaterials. Course topics span from ab initio to continuum methods, but emphasize molecular and mesoscale simulation tools used by computational materials researchers to predict morphologies of nanostructured and nano-assembled soft materials. Students are introduced to concepts of scientific computation and provided with the background and skills to critically evaluate materials simulation literature. They also gain a fundamental understanding of simulation methods in order to develop their own codes and use commercial materials simulation software packages intelligently and appropriately. As part of their coursework, students use the simulation software to generate images of nanostructures. The class displays and compares the images using an image gallery made available by MatDL.

3.1.2 Academic research group

The Computational Nanoscience and Soft Matter Simulation Research Group at U-M focuses on the prediction of assembled nanostructures composed of soft matter and soft/hard hybrid nano building blocks. In general, the research group analyzes how and why nanostructures are formed as well as how to design novel advanced materials using nontraditional methods. The characterization of nanostructures presents unique challenges to researchers due to the complexity of these materials and the experimental difficulties associated with directly viewing the arrangement of matter on nanometer scales. Consequently, materials simulation plays a particularly important role in understanding how and why different nanostructures form under different conditions. A unique set of information must accompany a nanostructure for it to be useful to a materials scientist. For example, the description might include the specific simulation method, force fields, approximations, geometric parameters, temperature and other thermodynamic variables, cooling schedule and equilibration time. MatDL is collaborating with the research group members to investigate creating optimal metadata for nanostructures.

3.1.3 Government research laboratories

The mission of the Materials Science and Engineering Laboratory at the National Institute of Standards and Technology (MSEL/NIST) is to integrate ongoing research from industry, academia, NIST, and other governmental labs by forming multidisciplinary and multi-institutional research teams to use the nation's talents and resources effectively to attack key materials issues. MSEL/NIST is comprised of six divisions: Ceramics, Center for Theoretical and Computational Materials Science, Materials Reliability, Metallurgy, the National Center for Neutron Research, and Polymers. To maximize dissemination of its work, MSEL is providing MatDL with ongoing research results and tools as well as division preprint archives of staff publications. This content establishes the initial foundation of the MatDL repository for use in the MSE courses.

3.2 Participant involvement

Recent research indicates that participant involvement benefits repository development (Renninger and Shumar 2002). A report of the NSDL sponsored Participant Interaction in Digital Libraries Workshop also supports this view. Workshop participants concluded that participant involvement is crucial to the success of educational digital libraries (Giersch et al. 2004). Successful digital repositories have the potential to be focal points for dynamic, engaged communities rather than just collections of resources. The report points out the reciprocal nature of the relationship between the digital library and its participants. It suggests that if users come to need and value the digital library, they are more likely to participate in sustaining it. Several preconditions for promoting user participation were recommended including: placing the digital library in the context of users' needs and integrating its resources and services into users' regular daily activity.

MatDL is developing tools that aid users' work and integrate submissions to the repository into users' workspace. The project anticipates that promoting participant involvement will provide mutual benefit to the community and the project. By participating, users can build collections that reflect their interests and needs. Community participation will inform MatDL about users' needs and interests, enabling the project to expand its resources in areas of expressed interests as well as suggesting new ways of managing and manipulating information resources. It is anticipated that laying the foundation for MatDL with a kernel of trusted leaders in the materials community will promote awareness in the larger community, increasing the likelihood others will participate by searching, using, and contributing to the repository. Social factors have been found to be important in the learning process (Lave and Wenger 1991, Wenger 1999) as well as in developing and sustaining scholarly communication forums on the Internet (Kling et al. 2003).

4 Tools and integration

MatDL recognizes that users have many demands on their time and energy. The necessity of integrating content and services into a user's regular workflow has been recognized by both research and commercial groups (Dempsey 2004, LexisNexis 2003). To increase the likelihood that users will contribute to the repository, MatDL maximizes users' time, minimizes their effort, and eliminates redundancies. Strategies to increase submissions include:

Requests by faculty at the kernel institutions for several software tools to facilitate teaching and coursework offered the opportunity to integrate MatDL into user workflow. Open source software applications, which are freely available and can eliminate or decrease development costs, were identified and adapted to meet these needs. Making these tools available as satellite services of MatDL provides both direct educational benefits to the students as well as pragmatic benefits to MatDL. Including options for submitting resources into the student's workflow is expected to increase resource submissions to MatDL. Currently, MatDL has installed two open source tools: an image gallery and a version control system. Both tools have been modified to enable users to easily submit to MatDL directly from the tool.

4.1 Image gallery

The image gallery was proposed as a teaching tool for the U-M Computational Nanoscience of Soft Materials course (section 3.1.1). As part of their course assignments, students generate and compare images of nanostructures. The Coppermine Photo Gallery open source image archive was adapted for this purpose. The gallery permits the instructor to project thumbnail images of many student resources simultaneously during class time, allowing students to easily view, compare, and discuss work completed by their classmates (Figure 2).

Full image gallery display

Figure 2. Full image gallery display

A larger individual image can be displayed by clicking on a thumbnail (Figure 3).

Individual image display

Figure 3. Individual image gallery display

When students upload images to the gallery, they provide some basic metadata such as author, image title, description, and keywords. They are also asked to answer four domain-specific questions:

As part of the MatDL investigation of optimal description of nanostructures, the class instructor suggested these specific questions as a means of eliciting from participants more complete and consistent metadata that materials scientists would find useful.

All uploaded images are retained in the gallery indefinitely. Students are asked to transfer representative examples of class work to the MatDL repository. Students may choose to submit an image to MatDL by clicking a button provided within the gallery. Once an image is selected for transfer, a script displays all previously entered metadata to allow students to make corrections or additions before the resource is finally deposited in the class collection in the MatDL repository. Students may also choose to deposit relevant software code along with an image. In the past, students taking this course had no mechanism for systematically tagging and storing their work. Consequently, nearly all of their output was lost. By archiving resources in MatDL, current student work is made available to inform future classes.

Students enrolled in the Fall 2003 class used the gallery and found it to be a useful classroom tool. However, they experienced difficulty supplying the metadata. Students seemed unclear about what to enter in general fields such as keywords and description. They also found the wording of the domain-specific questions to be ambiguous, causing them to sometimes enter relevant information into the wrong field. Based on this feedback, modifications have been made to the submission script for use by subsequent classes (Figure 4). The new script takes advantage of relational dependencies among the domain-specific questions. For instance, selecting "simulation type" from a drop-down menu causes information about the simulation method, model and system as well as keywords to be automatically generated. Additional detailed information about the simulation is requested including number of particles, number of time steps, temperature, number density, concentration of particle A, final phase, delta A, AA:BB interaction ratio, and A:B interaction. A description paragraph is automatically generated that incorporates the responses to all of these questions. Additional descriptive text is permitted.

import page

Figure 4. Import form incorporating changes based on user feedback

After the form has been completed, the item is submitted to the MatDL repository (Figure 5).

MatDL repository item

Figure 5. Item imported into MatDL repository using revised import script based on user feedback

In general, the new submission script is expected to meet the needs of not only this specific group but also similar courses taught at other institutions. Ultimately, a flexible template will be made available that will allow instructors to design an input form that meets the specific needs of their class.

While the Computational Nanoscience and Soft Matter Simulation Research Group (section 3.1.2) does not currently use the gallery tool, the script written to route submissions between the image gallery and the MatDL repository has been used to streamline the submission process for these users. Research group members may upload images and code through this script. In the original submission script, they were prompted to supply all of the metadata initially asked of the students as well as two additional requests:

This script will also be modified based on user feedback.

In contrast to the student group, the research group has retained past output. However, group members often found identifying and retrieving specific work to be difficult and time consuming. Results of MatDL's investigation of metadata about nanostructures that materials scientists find useful are expected to have the practical advantage of greatly improving retrieval. In addition, adoption of standardized description mechanisms would facilitate the easy exchange of data between collaborating research groups.

4.2 Version control system

A version control system was proposed to facilitate coursework that engages students in writing software in team-based projects. The Concurrent Versions System (CVS) open source version control system was chosen to meet this user need. CVS records the history of software source files (Cons et al. 2004). It stores all the versions of a file in a single file, retaining only the differences between versions to avoid wasting disk space. It can help users ranging from individual developers to large, distributed teams. For both individual and group projects, the version history can be helpful in pinpointing a modification that introduced a bug into the software. For groups, CVS provides a mechanism that allows people to work on the same project without risking overwriting each others' code. CVS does this by having individual developers work in their own directories, and merging the work only when each developer is finished. In Spring 2005, students will begin using CVS to jointly write simulation codes that generate images. As part of student workflow, these resources will be uploaded to MatDL through the CVS Web interface.

In addition to software, CVS can be used to handle various data formats. Currently, the CVS tool is being used to manage the development of a Transport Archive (Powell 2004). The content of the archive initially comprises materials from the Transport Phenomena in Materials Engineering course (section 3.1.1) taught at MIT. Resources include exercises, handouts, and courseware in LaTeX and PDF formats, with postscript or PNG figures. Educators interested in this material may mix and match problems (Figure 6) to form assignments. They may access the latest or any previous versions of each resource and correct or amend problems as necessary. Frequent contributors will be granted write access to the archive. Initial content and major revisions will be submitted to the MatDL repository. The intent of the Transport Archive is to store educational resources developed and accessed by the international transport community of educators.

example transport problem

Figure 6. Example problem from the Transport Archive

5 Discussion

5.1 Collaboration with the kernel community

The multidisciplinary, multi-institutional nature of the kernel community mirrors the opportunities and challenges within the larger materials community. The MatDL project benefits from the wealth of practical experience and expert knowledge shared by kernel community researchers and educators. They are able to communicate practical knowledge about their everyday work environments as well as expert domain knowledge about critical elements of materials information. Valuable feedback is also obtained as investigators and students interact with the tools and repository. For example, U-M contributed to the development of the gallery tool submission form by providing expert information and feedback concerning the descriptive information needed to accompany the nanostructures generated by the Computation Nanoscience of Soft Materials class (section 3.1.1). They also knew that similar MSE classes at other institutions tend to run the same simulation types, increasing the likelihood that the final version of the submission form will generalize to those courses. They have also offered valuable input on the development of a submission form suitable for the research group, and have raised important practical considerations, such as the willingness of researchers to submit resources to a generally accessible repository before results have been officially published and intellectual property rights have been established. These participants have expressed the need for a secure, private lab notebook to store selected resources away from public access until a resource is ready for distribution.

The challenges of interacting with the kernel community include geographical separation, lack of time, coordination of busy schedules, as well as making progress on various facets of the project simultaneously. To capitalize on benefits and address potential problems, the MatDL project has attempted to establish communication mechanisms with its kernel community that are efficient and require minimal time commitments. Email lists for the entire project and for the principal investigators and senior staff enable brief information about new developments for the project to be widely or selectively disseminated on a continuous basis. For full discussions about the accomplishments and directions of the project, all project participants, principal investigators, senior research investigators, undergraduate and graduate students supported through MatDL, meet twice a year for an all-day meeting to keep the project on track. Each institution hosts an all-day project meeting to acquaint all the members with one another and to strengthen connections within the project. Based on the discussions from the biannual meetings, institutional and project-wide action plans and timelines are developed to reflect the needs and priorities for the kernel community within MatDL. The action plans and timelines are continually reassessed and revised to address the original goals and objectives of the MatDL based on the experience gained in working with the kernel community.

5.2 Outreach

The kernel community provides a manageable testing environment to investigate the interests and needs of the materials community to facilitate greater participant submissions in building the MatDL repository. Offering existing tools to new users as well as adding tools to meet other user needs is expected to expand the community. In addition, presentations at materials professional societies and collaborations with other materials-focused funded initiatives can extend participation beyond the kernel group to the larger materials community. Drawing upon the breadth of expertise among the collaborative team in the MatDL project, presentations and papers about MatDL are being delivered at the education, multidisciplinary, and research sessions of conferences across the cognate disciplines.

Collaboration among individual projects that are externally funded under separate programs but which have similar interests can lead to greater cumulative impact by reaching larger numbers of materials researchers, educators, and students while meeting the immediate goals for each program. An example of a separately funded initiative with goals similar to the NSDL program would be the NSF Research Experience for Undergraduates program (REU). The REU program involves undergraduates who work in the research programs of faculty at host institutions. Each student is associated with a specific faculty research project, where he/she works closely with the faculty and other researchers. As a result of the REU project, students prepare individual presentations of their research that are made available on the REU host sites. The NIST Summer Undergraduate Research Fellowship (SURF) and the KSU Undergraduate Summer Research Experience program host materials related REUs. These REUs and MatDL are collaborating by having the students use and submit their research to the repository. MatDL will benefit from the collaboration by expanding participation beyond the kernel community. The REUs will benefit by having one centralized location for materials related REU projects as well as having metadata about the REU projects stored and disseminated through the NSDL Metadata Repository.

6 Conclusion

A critical challenge in the development and sustainability of digital repositories is fostering ongoing participant involvement within digital communities. To encourage contributions and broaden participation, MatDL is investigating the integration of its submission process into adapted tools as part of the user's workflow. Open source "tools", such as an image gallery and a version control system, have been adapted to facilitate users' work through satellite services provided within MatDL. MatDL aims to extend the strategy from the kernel group to the larger materials community by offering use of these tools as well as adapting other broad-based, open source tools.  By integrating submissions into users' workflow and increasing submissions from students, educators, and researchers, digital libraries can act as catalysts for interaction between research and education.


MatDL is part of the National Science Digital Library project and is supported by National Science Foundation grant DUE-0333520 and National Institute of Standards and Technology grant 70NANB3H1079. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.


Beaulieu, A. (2002) "A Space for Measuring Mind and Brain: Interdisciplinarity and Digital Tools in the Development of Brain Mapping and Functional Imaging, 1980-1990". Brain and Cognition, Vol. 49, 13-33

Bureau of Labor Statistics (2004) "Career Guide to Industries, 2004-05 Edition, Pharmaceutical and Medicine Manufacturing", Bureau of Labor Statistics, U.S. Department of Labor, February 27

Cons, L., et al. (2004) "CVS--Concurrent Versions System v1.11.14: 1 Overview". Concurrent Versions System (CVS), March 11, 2004

Dempsey, L. (2004) "Places, collections and services". VALA2004: Breaking Boundaries: Integration & Interoperability, the 12th Biennial Conference and Exhibition of the Victorian Association for Library Automation Inc., Melbourne (3.3 MB Powerpoint slide presentation)

Giersch, S. et al. (2004) "Participant Involvement in Digital Libs". NSF/NSDL Workshop: Participant Interaction in Digital Libraries, Philadelphia, PA

Kling, R., McKim, G. and King, A. (2003) "A Bit More to It: Scholarly Communication Forums as Socio-Technical Interaction Networks". Journal of the American Society for Information Science and Technology, Vol. 54, No. 1, 47-67

Lave, J. and Wenger, E. (1991) Situated Learning. Legitimate Peripheral Participation (Cambridge, UK: Cambridge University Press)

LexisNexis (2003) "Integration: Capitalize on the Opportunities Now". EContent, Leadership series of White Papers, June

Powell, A.C. (2004) "Transport education in the materials curriculum: Innovative approaches, exercises, and new challenges". Presented at the Minerals, Metals & Materials Society Annual Meeting, Charlotte, NC, March

Renninger, K.A. and Shumar, W. (2002) "Community Building with and for Teachers at The Math Forum". In Building Virtual Communities: Learning and Change in Cyberspace, edited by K.A. Renninger and W. Shumar (New York, NY: Cambridge University Press), pp. 60-95

Wenger E. (1999) Communities of Practice. Learning, Meaning and Identity (Cambridge, UK: Cambridge University Press)


ASM Materials Solutions Conference, Fall 2004

Computational Nanoscience and Soft Matter Simulation Research Group

Concurrent Versions System (CVS) open source version control system

Coppermine Photo Gallery: Documentation and Manual

"Educating Tomorrow's Materials Scientists and Engineers" session, Spring 2004 Materials Research Society Conference

KSU Undergraduate Summer Research Experience program

Materials Digital Library

Materials Science and Engineering Laboratory at the National Institute of Standards and Technology (MSEL/NIST)

NIST Summer Undergraduate Research Fellowship

NSF Research Experience for Undergraduates