Expressing Personal Interpretations of Music Collections in Spatial Hypertext

Konstantinos A. Meintanis
Department of Computer Science & Engineering, Texas A&M University
kam2959@cse.tamu.edu

Frank M. Shipman
Department of Computer Science & Engineering, Texas A&M University
shipman@cse.tamu.edu

Abstract

Managing music collections often involves prioritizing explicit metadata fields, e.g. artist, album, year, in order to structure the collection on a storage device or display it in an interface. These metadata values are used because they are independent of one's interpretation of the music and, thus, equally recognizable for all users. This paper presents a study of how people develop interpretive organizations for music in spatial hypertext. The resulting organizations included a variety of personal interpretations that drew from participants' knowledge of songs, memories associated with songs, and assessment of the mood of songs. Participants valued the expressive capabilities of spatial hypertext but missed the metadata-based tree views of the music collections for locating music.

1. Introduction

Many people's approach to managing and listening to their music collections involves grouping and selecting songs with shared explicit attributes (metadata values, access frequency, access recency, etc.). For example, new songs are assigned to categories based on the metadata fields that users consider important (often the genre, artist, and album) while songs already in their collection are retrieved by searching based on metadata values. Thus, the organization, selection and ordering of songs is based on explicit attributes and not on the audio characteristics of music, unless they are somehow represented in the metadata.

The common metadata fields attached to music are valuable for providing context-free information about the music - the artist of a recording does not change between playbacks - but are not necessarily the music characteristics that express what users really seek. How can users pick music that they find happy, energizing, or calming? What about music that reminds them of high school, or of college, or of particular family members or friends? While users can add metadata fields, they rarely do so because the potential value is outweighed by the overhead of expression (Shipman and Marshall 1999), especially when their expression involves interpretation that is likely to change over time, such as the feelings and memories triggered by listening to the music.

Several research groups have tried to develop techniques for assessing music relatedness based on explicit and implicit information about songs. Liu and colleagues (2003) have developed a method for extracting the intensity, timbre and rhythm of songs and combining these features for mood detection. Tsai and Wang (2005) follow a similar approach where they isolate and analyze the solo vocal signals of music to extract vocal-related features. Zadel and Fujinaga (2004) use collaborative filtering to find similar artists by measuring their co-occurrence in the Amazon Listmania! Database. van Gulik and colleagues (2004) developed visualizations using metadata-based clustering of music files to aid access to music on small screen devices. In these examples, the evaluation of relatedness relies on feature extraction for songs, metadata, or statistics that aggregate user preferences. More direct input from users as they manage their collections is not taken into consideration.

To explore how people would organize music if there was a low-effort way of expressing personalized interpretation of music, we performed a study of music organization in spatial hypertext. Spatial hypertext provides an environment designed to reduce the overhead of user expression. The results of this study are meant to inform the design of environments for managing personal music collections, including a greater understanding of the characteristics of music people might want to express that they currently do not and the roles of such expression as compared to standard metadata.

The next section briefly describes the features of the Visual Knowledge Builder as they pertain to music management. Section 3 describes the study design and data collection. Section 4 presents the results including participants' prior experience with music organization, their organizational strategies, and their assessment of the task and tool. Section 5 discusses implications for the design of system for personal music management.

2. Visual Knowledge Builder

The Visual Knowledge Builder (VKB) is a general purpose spatial hypertext system (Marshall and Shipman 1995). It has been used to search, organize, and attach metadata to documents retrieved from NSDL and Google (Shipman et al. 2004). Here, we only present an overview of its capabilities related to the study of music organization presented in this paper.

Spatial hypertext emerged from node-and-link and map-based hypertext in order to better support the evolving and emergent interpretations that often occur during the early stages of information analysis tasks (Marshall and Shipman 1995). A spatial hypertext consists of a set of information objects with visual attributes (e.g., color, border width) and spatial layout (e.g., lists, piles) to indicate relations between information entities. In VKB, these objects are placed into a hierarchy of visual workspaces called collections. The barrier of expression is reduced relative to traditional hypertext because the assignment of visual attributes and arrangement of objects requires less effort than creating explicit relations between the objects. By providing a variety of different visual attributes to modify and the ability to organize materials in space, users can easily express a variety of relations and strengths without having to verbally express the meaning and degree of relations.

Figure 1 shows the result of VKB being used to organize music. The layout of this space shows the results of using drag-and-drop to put a directory tree containing albums from a number of artists into VKB. In this case, the directory structure of the file system is preserved as a hierarchy of collections. VKB also attaches the metadata contained in the ID3 tags of the audio file to the object and sets the object display to show the title and artist of each song. When the cursor lingers over the border of an object, the metadata is shown in a popup. The color and border width variations are the result of the user's expression.

A VKB workspace resulting from importing a directory tree of music.

Figure 1. A VKB workspace resulting from importing a directory tree of music.

VKB plays the audio file while the mouse cursor lingers over an object. This is an auditory form of progressive disclosure similar to providing metadata, snippets of content, or thumbnail images of textual or image content in a popup. In the following study, audio playback on mouse-over was limited to the first 10 seconds of a song to increase system stability. Subjects could still listen to the whole music file in Windows Media Player by double-clicking on the object in the VKB workspace.

3. Study Design

The study was conducted in the Center for the Study of Digital Libraries at Texas A&M University. Twelve graduate students age 24 to 38 were recruited to take part in the study including 10 men and 2 women. There was no compensation. The majority (75%) of the participants had previously used VKB for other tasks, reducing the impact of software novelty on results. Regardless of prior experience, participants were trained in the use of VKB prior to the study task.

Collections of 100 pre-selected songs were created for four music genres (rock, dance, lounge, and classical). The participants were asked to select a genre and then given 60 minutes to organize the songs for that selected genre. Participants were encouraged, but not forced, to "think out of the box" of the traditional metadata classification and create collections based on their own interpretation of the music. After their organization was complete, participants were asked to create three playlists for activities or events of their own choosing. Subjects were allowed up to an additional 30 minutes to create playlists.

Demographic data about the participants was collected via a pre-task questionnaire. The organizational process and results were recorded via monitoring code built into VKB, screen capture software, and the resulting VKB files. Post-task questionnaires and semi-structured interviews were used to gather information about the participants' perceptions of the task, tool, and experience as well as their strategies and practices in organization and expression.

4. Results

All participants completed the first task of organizing the 100 songs. Eleven of the twelve participants completed the second task of creating three playlists, with one participant not having the time to continue through this portion of the task.

4.1 Experience with Digital Music Collections

The pre-task questionnaire asked about participants' experience in using applications for managing songs and generating playlists. All participants had previous experience in organizing songs. 67% (8 of 12) have a collection with more than 200 songs and 67% spend at least 15 minutes organizing their music every week. Most participants spend a significant amount of time listening to their collection. 67% spend more than 30 minutes during each sitting, and 50% listen to or organize their songs more than 5 times each week. Only one subject was satisfied with the playlist creation techniques found in most commercial and freeware software where selection and ordering of songs is based on filtering metadata and usage statistics. 83% (10 of 12) replied that they create their playlists manually by browsing their collection and dragging-and-dropping songs into their players.

4.2 Organization and Expression

Only two genres were selected by participants, with four participants selecting rock and eight participants selecting classical. Because the music was pre-selected, participants were confronted with both known and unknown music. Listening to unknown music became a significant activity during organization.

Organization using categories and subcategories with labels.

Figure 2. Organization using categories and subcategories with labels.

Figure 2 shows part of the finished workspace for one participant. The songs are divided into those the participant knew and those he did not. The unknown songs were organized based on the participant's opinion about the artist ("generally like the artist", "neutral about the artist"). The songs he knew were grouped based on personal assessments of the music ("like but hard to listen to", "cheesy", "hate", "fun songs", and "too slow") and associations the music had for the participant ("remind me of my wife"). Some of these categories had further subcategories such as the "I swear my wife has these songs on a mix-CD" under "remind me of my wife" and "classics" under "fun songs". This participant's workspace shows a greater degree of structure and interpretation than the workspaces created by most of the participants.

Figure 3 shows a one-level categorization of songs with the explicit categories related to the lyrics, themes and mood of songs. There are collections for "story" songs, "philosophical" songs, "love/romantic" songs, and "lonely/forlorn" songs. In addition, there are categories related to style of music ("punk"), for disliked songs, and songs that did not fit into categories he had already created. This participant used the color and border width of objects to indicate additional features of the music beyond his labeled categories.

Single-level organization using collections, color and border width.

Figure 3. Single-level organization using collections, color and border width.

Table 1 lists the labels for collections and lists created by participants as well as the labels for all the playlists created. Participant 4 did not create playlists due to a time constraint. These descriptors refer to user preferences, characteristics of the music, and characteristics of the activity or situation for listening to the music.

The textual descriptors used for collections and playlists for study participants.

Table 1. The textual descriptors used for collections and playlists for study participants.

Seven participants' organizations included both positive and negative descriptors of their preference for songs. One participant did not include a positive descriptor but had a "don't care/dislike" collection. The other four participants did not express preferences in the labels attached to their organizations. Seven participants included descriptors related to musical features in their organization. Six of these included characterizations of the mood of the music ("serious", "peaceful", "calm", "aggressive") and three included terms that relate to genres ("funky", "ethnic/folk", "punk"). Finally, three of the participants had organizations that included labels to the contexts in which they would want to listen to music ("programming", "gloomy day", "off to sleep"). Overall, the personal interpretation of these spaces is consistent with the "idiosyncratic genres" found in Cunningham and colleagues' (2004) ethnographic study.

In the post-task questionnaires and interviews, participants detailed their strategy for organizing songs into collections and categories. 83% (10 of 12) of the participants reported grouping music based on its dynamics, especially its tempo, energy, harmonic structure and tonality. Several participants also reported that they created collections based on how well they knew the songs (33%, 4 of 12) and how serious/important the songs sounded (25%, 3 of 12).

With regard to creating playlists, 83% of the participants again reported that music dynamics were important to placing music into playlists Some participants (25%, 3 of 12) said that they formed playlists based on the lyrics (e.g., if the song has lyrics, what the lyrics say, and if they are "singable").

Besides the creation of collections and placing text labels in the workspace to describe groupings, participants expressed their opinions about music visually. The visual attributes used most were background color (58%) and border thickness (33%). Background color was used to distinguish songs with different dynamics, express categories and preference, and to indicate familiarity with the song. Border thickness was used to indicate familiarity with the song, express preference, and distinguish songs with different dynamics (mainly tempo).

Five of the twelve participants reported using the relative position of the songs (arrangement and distance between two objects) to indicate order of playback, degree of importance, and difference in dynamics. Two participants reported using absolute position (coordinates of objects in space) to indicate preference in a specific collection or song.

Participants were asked to avoid using metadata for the organizational part of the study only, and not during the playlist creation portion of the study. The resulting playlists indicate that participants chose to put together music that they find related to (e.g., put in similar portions of the space, have similar visual attributes) but that are not necessarily similar in terms of explicit metadata values. Figure 4 shows a playlist of classical music that participant 6 put together for a party. Creating playlists is neither a process of collecting resources that share the maximum number of attributes nor a randomly selected set of songs. Participants indicated creating playlists requires the selection of items that sound good and fit well together. This requirement for playlists explains why only one subject reported being happy with the automatic playlist creation approaches found in the commercial and freeware music software that they use.

Example Playlists.

Figure 4. Example Playlists.

4.3 Comments on System Features

While they liked using visual expression for organizing music participants still wanted to have direct access to the metadata values and the explicit associations between the songs they manage. Some participants indicated the need for having interactive hierarchical/tree views of the music collection as it is stored in the file system, similar to current commercial music management software. Participants expressed an appreciation of the visibility of metadata information in the music objects in VKB as it supported a first, gross assessment of what could possibly sound good together without having to listen to the songs. 66% of the participants said that VKB helped them to organize the songs efficiently and 83% enjoyed doing the task.

Consistent with the experiences of previous VKB users, participants' comments indicated that the VKB workspace is superior in expressive power and freedom compared to the traditional hierarchical, folder-like views of the file system. They were able to create abstract structures, express granularity in their associations, describe various types of relationships (other than similarity), and even create alternative views of the same original collection in the same workspace. Moreover, 66% of the participants said that VKB helped them to organize the songs efficiently and 83% enjoyed doing the task.

Participants liked the preview feature of the workspace where they could easily listen to the first few seconds of the songs by just hovering the mouse cursor over the objects. This made it possible to listen to a music snippet without of additional steps or applications. Comments and observations show that the preview feature proved beneficial for helping users identify songs they already knew. However, playing the first 10 seconds was not sufficient for becoming familiar with new songs.

5. Discussion

The results of the study show that there are benefits and weaknesses to organizing personal music collections based on the context-independent metadata found in current tools and the malleable personalized interpretation found in spatial hypertext.

5.1 Supporting Personal Interpretations

Knowing ahead of time what characteristics of music are going to be important to a particular user is difficult. Most music- management systems support personal interpretation through the addition of new metadata fields and values. Users rarely do this because of the effort required in the human-computer interface and because users may wish to express characteristics that are difficult to describe textually (attribute names and values are generally textual.) Expressing that Blondie's Rapture is kind of funky but not as funky as The Sugarhill Gang's Rapper's Delight via metadata changes personal interpretation into a form of knowledge engineering.

Such relative assessments of musical characteristics were part of why participants positively assessed the ease of expression in spatial hypertext for personal interpretation. They found visual expression facilitated their interpretation of mood, memories, and musical dynamics. Yet, participants also indicated that the lack of views of their collection based on traditional metadata made it more difficult to locate songs that they knew they wanted. Visual personal interpretation, at least in the time-limited task of the study, enhanced users' expression but the resulting expressions were not always efficient representations for locating specific songs.

5.2 Metadata Visibility, Access, and Manipulability

Systems need the predictability, consistency, and formality found in the context-independent metadata fields associated with music files. This is the strength of current commercial applications. Eight participants in the study indicated the need for having access to views of the collection based on metadata through either metadata filtering or metadata-based tree views.

The personal interpretation found in the VKB collections was based on subjective characteristics far more sensitive to change than an organization where membership is decided based on well defined and consistent criteria. Music understanding, perception and mood are volatile factors that differ not only from person to person, but also from time to time. An organization relying only on user perception can be so dynamic that locating specific pieces requires remembering the context in which the music was positioned in order to predict where it can be found.

The study also found that users view traditional metadata as insufficient for expressing their desires for playlists. Participants reported that playlists involve selection of music that includes variation yet fits well together in the current context. Six participants reported that they found visual expression in VKB useful as compared to their prior experiences organizing music collections.

These results indicate that the traditional metadata (artist, composer) is valuable for navigation of a collection but not for the direct specification of desired music and that the personal interpretation found in the visual expression was valuable for selection of music but not for navigation within the collection.

5.3 Combining User Interpretation and Context-Free Metadata

What is missing are environments that combine the easily expressed interpretations of music found in spatial hypertext systems with the predictable and consistent explicit descriptions found in current metadata-based systems.

Besides providing dual views of music collections according to traditional metadata and personal interpretation, those environments should also attempt to bridge the gap between personal interpretation and features of music that the system can interpret. In addition to metadata, systems can assess music similarity based on signal processing of the audio content (Aucouturier and Pachet 2002; Foote 1997; Logan and Salomon 2001), collaborative filtering (van Breemen and Bartneck 2003; Crossen et al. 2002; Li et al. 2004), and lyric analysis (Logan et al. 2004). Such techniques provide alternate, and potentially divergent, assessments of music similarity.

Spatial hypertext systems like VKB include spatial parsers that employ heuristic techniques to recognize the interpretive structures created by users (Francisco-Revilla and Shipman 2004). The recognized visual structures can indicate what music characteristics the user finds relevant for their organization. These characteristics can then be the basis for computing a personalized clustering of music collections or for personalized weighting in relevance feedback algorithms (Hoashi et al. 2002).

5.4 Easy Access to Music

Creating and managing collections based on how music sounds requires sufficient knowledge of the music content. Organizing a small set of familiar songs can be a fairly easy task of simply remembering and associating the basic melodies. However, classifying a large quantity of music, such as people collect over years, can be a challenging and time-consuming process requiring extended periods of listening to and comparing songs. The study showed that having direct access to the music content without the contextual overhead of launching additional applications simplifies the process. Access to short snippets of music supports people's remembering what a song sounds like while access to the music as a whole is beneficial for assessments of unfamiliar music. To improve the efficiency of access, future music management environments should support music summaries (Logan and Chu 2000; van Breemen and Bartneck 2003), to provide more time-efficient overviews of the musical content.

6. Conclusion

Software supporting music management currently emphasizes the application and use of context-independent attributes of music files. While this metadata is valuable for locating specific files, it is not satisfactory for generating playlists automatically.

When encouraged to organize collections of music without using traditional metadata, participants used personal characteristics such as how well they knew or liked the song, memories they associated with the song, and their assessment of a song's mood or musical characteristics. Such assessments are highly personal and may even vary across time and contexts for the same user.

The study shows that the management of personal music collections benefits from both the predictable access provided by metadata and the personalized interpretation found in spatial hypertext systems. By combining metadata, user interpretation and system assessment, the next generation of music management environments could support users more completely by combining context-independent access capabilities with the personal interpretation capabilities of spatial hypertext.

7. References