1 Introduction
The broad area of digital video has opened up huge possibilities for future use of video content: video clip libraries for broadcast companies, video editing systems for film producers and various home video entertainment systems such as DVD, WebTV and video-on-demand. Commercial interest in digital video technology and applications is great because the potential market for exploiting video as a digital artifact is huge.
In domestic environments, most households own at least one conventional TV and VCR and use them in their everyday lives. Set-top boxes such as TiVo (TiVo) and ReplayTV (ReplayTV), which record broadcast TV programmes in a digital format on internal disks, are starting to become common household items. These products are currently the digital version of the conventional VCR and get rid of the problems inherent in conventional systems, such as degrading video quality and managing a number of video tapes. DVDs are spreading quickly, with their high quality video contents and attractive extra features such as director comments, multiple subtitles, changeable camera angle, and direct access to various pre-indexed scenes. Video on the Web is also becoming a reality, though development is still in its infancy, based on video streaming technologies which allow playback of digital video files on the client device while simultaneously receiving the file data. With higher bandwidth networks, this will become much more reliable and faster, as if playing from local data storage. Tying all these applications of digital video technology together is the huge potential behind digital video libraries, large collections of video information which can be accessed directly by users and from which a range of services can be offered, from simple playback to searching, alerting, summarisation, hyperlinking, and so on.
As with any field that is driven largely by technological development, however, it is often human considerations that are neglected or overlooked when a new digital video product or service is developed. The user interface to accessing digital video, often overwhelmingly complex with its multimedia elements and control widgets, is a good example of where technological possibilities largely dictate the shaping of the end product or service. Human-computer interaction and usability is often manifested less in the engineering and feature-driven approach than in some minor cosmetic work done at the end of the development. The conventional user interface development process, which starts from a task specification and goes on through a development cycle, becomes difficult because in the development of digital video systems there exists no established system environment with an established user group. The result is often a product with surprising functionalities but with a poorly crafted user interface.
This paper considers in depth the development of the user interface to digital video library systems. Our main focus is the part of user interface that allows the user to browse video content via sets of keyframes extracted by the system. We have developed a design framework that allows us to present many different and distinctive keyframe-based video browsing interfaces for a digital library. This framework allows the design of browsing interfaces to be considered systematically, through a well-defined list of design options and their possible combinations. Using this framework we have designed and implemented many example browsers, some of which are described in this paper. While this framework provided us with the initial stage of novel interface design, we have gone through several series of user testing throughout the development to fix usability problems with these browsers and to get feedback on our design. In doing this we acknowledge the importance of user-based evaluation in the development of our interface work.
The browsers to be tested have been incorporated into a working digital video system called Físchlár (O'Connor et al. 2001) for the purpose of obtaining real usage information on the browsers. The system has been operational for almost two and a half years and currently has more than 1500 registered users freely recording, browsing, searching and watching broadcast TV programmes from within our campus environment. At any time we have almost 400 hours of video content available, and this has provided us with a unique and important environment for user evaluation of the designed browser interfaces and other usage aspects of a digital video library.
The paper is organised as follows. Section 2 reviews user interfaces to current video library systems, with the focus moving from overall user interfaces to keyframe-based content browsing interfaces. In section 3, the framework we have developed to design keyframe-based browsing interfaces is introduced, with many example systems referred to in relation to this framework. Three distinctive browsers that we have designed and implemented using the framework are introduced in section 4, and some usability testing results are presented which show several important findings. Section 5 presents our video digital library system called Físchlár, and the results of system usage measured by observation (by logging analysis and user interviews) are also briefly presented. Section 6 concludes with directions for future work.
2 User Interfaces to Digital Video Libraries - the Current Scene
This section outlines the state-of-the-art in user interfaces to video digital library systems, covering both experimental and commercial systems. The interfaces to video digital library systems can be described in terms of many different features for supporting different user tasks. We have tried to categorise different features to see roughly which elements are well researched and used, which elements are under development, and which are yet to be investigated. From this overall picture we then focus on general user interfaces for content browsing, and then on keyframe-based content browsing interfaces, with which this article is concerned.
2.1 User Interface to Digital Video Libraries
Systems developed for accessing digital video libraries naturally lead to more complex front-end interfaces, as the rich information attributes of the video medium itself (visual, audio, and other meta information), plus their indicators and controlling widgets, all have to be used and displayed on the screen. As found in most areas of computer system development, concerns about the user interface to digital video library systems are often not an aspect of central importance, either in the research or the commercial sectors. In practice, much emphasis has been placed on developing video data-specific storage, compression, and description methods such as the MPEG, as well as automatic content analysis primitives such as camera shot boundary detection and object tracking through frames. The outcomes from these technical considerations are that they often dictate what the user interfaces should be like for the resulting video library systems. When the enormous possibilities of digital video are perceived and implemented from the technical point of view, it is hardly surprising to see these trends of technologically-driven system development being so dominant.
To identify some current trends in user interface design in digital video library research and products, Table 1 (taken from Lee et al. (1999)), lists 24 typical user interface features of digital video library systems.
Table 1. Feature-system matrix (see as an image)
Feature \ System | SWIM |
MediaSite (Informedia) |
VISION | WebSEEk |
Internet CNN NEWSROOM |
VideoSTAR | FRANK | CAETI |
Pop-Eye/ OLIVE |
VideoQ | MovEase | NeTra-V | EUROMEDIA |
Screening Room |
VideoLogger |
CATALOGUING TOOL | |||||||||||||||
Semi-automatic cataloguing tool | O | O | O | O | |||||||||||
Manual cataloguing tool | O | ||||||||||||||
Threshold adjustable before automatic segmentation | O | O | |||||||||||||
TEXTUAL QUERY | |||||||||||||||
Natural language (or Keyword) input | O | O | O | O | O | O | O | O | O | O | O | O | O | ||
Category or Keyword list browsing | O | O | O | O | O | O | O | O | O | O | O | ||||
Use audio information for indexing/searching | O | O | O | O | O | O | O | O | |||||||
Automatic full transcript generation* | O | O | |||||||||||||
VISUAL QUERY | |||||||||||||||
Keyframe-based sketch-drawing | O | O | |||||||||||||
Histogram manipulation | O | ||||||||||||||
Keyframe-based QBE | O | O | |||||||||||||
Motion-based sketch-drawing | O | O | |||||||||||||
Motion-based QBE | O | O | O | ||||||||||||
VIDEO BROWSING METHOD | |||||||||||||||
Textual description | O | O | O | O | O | O | O | O | O | O | O | O | |||
Transcript | O | O | O | O | O | O | O | ||||||||
Single keyframe | O | O | O | O | O | O | O | O | O | O | O | ||||
Keyframe list in chronological order (storyboard) | O | O | O | O | O | O | O | ||||||||
Option for different granularity of keyframes set | O | O | O | ||||||||||||
Interactive hierarchical keyframe browser | O | ||||||||||||||
Keyframe slide show | O | O | |||||||||||||
Video summary playing | O | ||||||||||||||
Playback | O | O | O | O | O | O | O | O | O | O | O | O | O | O | O |
Transcript + Playback synchronisation | O | O | |||||||||||||
Keyframes + Playback synchronisation | O | O | O | O | |||||||||||
Text search with Playback and/or Keyframes sync | O | O | O | ||||||||||||
INTELLIGENT KEYFRAME SELECTION | O | O | O | O | O |
Detailed explanations of each of the features and references to each system can be found in Lee et al. (1999). Observing the distribution of features in systems in Table 1, it can be seen that:
- Most systems use a textual querying interface and few systems provide any form of visual query interface, probably indicating the need for further development in this area;
- Most systems use keyframe(s) as their video browsing method;
- Playback is provided in all listed systems, indicating that playback is regarded as a most important interface feature;
- Whereas most systems provide more than one video browsing method (often transcript + playback and/or keyframe + playback), browsing aids such as synchronisation between different browsing methods are not often facilitated.
Although our overview of these 15 systems is quite cursory, it is clear that individual systems do not provide all possible interface features, as different systems have been designed to cater for different users, tasks and domains. However, it can be observed there are cases of a feature being provided when some other related features are absent, when both could be useful if they were provided together.
To help us think and reason about user interfaces to digital video library systems more clearly, it is useful to break down the interface into several elements supporting different user actions. Here the idea of identifying different stages of a user's information-seeking behaviour becomes useful: starting with the decisison of which information source to use, searching for a document in the selected collection, searching for a part of a document, reading that point in the document, then returning to search, and so on. This is described in, for example, the seven stages of action (Norman 1988, p48), the various information-seeking sub-processes (Marchionini 1995, p49), the four-phase search process (Shneiderman et al. 1997) and the eight sequences of interaction cycle (Hearst 1999, p263)). Although these different stage models emphasise the unpredictable and non-purposeful change of directions between their respective stages, thinking in terms of the individual stages is useful in helping us to consider the user interface clearly, because each stage of information seeking usually requires different interface support features. Figure 1 illustrates a simple categorisation of supporting user interface features, each roughly responsible for different stages of the four classical information-seeking models mentioned above.
On the right-hand side of Figure 1, we have added our own rough grouping of the supporting user interface (UI) elements by the stages of information seeking. They are interface elements used to support:
- browsing and then selecting video programmes (as a collection)
- querying within a video programme (content querying)
- browsing the content of a video programme
- watching (part of) a video programme
- re-querying the video digital library and/or within a video programme
Different systems targeted for different uses will have more emphasis on certain elements than on others. For example, a large video clip library designed for searching by news broadcasting staff will have a more elaborate library querying interface to help users efficiently retrieve a specified video clip, whereas a home digital video system will have stronger interface features in the areas of content browsing and playing.
It appears from our literature study summarised in this paper that all of the different interface elements need more investigation and experimentation than has been reported to date. Only a small number of interface studies and experiments can be found in the literature for the different interface facilities introduced above. We will now examine these under the five rough groupings we have developed.
2.1.1 Interfaces for Browsing and then Selecting Video Programmes
Searching through a digital video library is often done with queries on some form of metadata from the video such as title, date or description. Most systems with very large video collections will require an interface for such querying. The main interface of the Físchlár system (Mc Donald et al. 2001b) organises video items in different folders so they can be easily sorted, filtered and browsed, as in common email management software. FilmFinder (Ahlberg and Shneiderman 1994) offers a novel visualisation of a query result, with a set of slider bars as a user's query filter. The resultant films are dynamically and immediately reflected as the user drags the slider bars. When there are a large number of items to be displayed, novel visualisation methods originally developed for bibliographic data collections (such as in Veerasamy and Belkin (1996), Hearst (1995), and Rao et al. (1995)) can be adapted for visualising a video collection. Information space visualisation is one relatively well investigated area and many interesting ideas have been demonstrated. A good example of this in the video library area is the Informedia project (Christel and Martin 1998) where the videos from the collection are visualised in a scatter-plot display, with the relationships among video items in terms of query terms visible, as Figure 2 shows.
Figure 2. Video query result visualisation interface
in the Informedia Digital Video Library
(full-size image) (with permission
from the authors)
Whether the unit of retrieval is an individual scene, a video clip or a complete programme, visualisation of the user's query result is a good way of informing the user of the characteristics of the retrieved item, thus providing some clues on how to go about re-querying to retrieve the desired video items more precisely, or to commence viewing some of the retrieved objects.
2.1.2 Interfaces for Querying Video Content
When a query is specified by a user in terms of actual video content attributes, more specialised interfaces than those used to capture text-based queries are required. Manually constructed video databases such as VideoSTAR (Hjelsvold et al. 1995) have a text-based query tool where the user can select different content attributes such as person, events and locations selectable from a list, all manually pre-indexed. When it comes to automatically indexed video collections, VideoQ (Chang et al. 1997), NeTra-V (Deng et al. 1998) and MovEase (Ahanger et al. 1995) are rare examples of content-based querying interfaces where the user can query either by drawing objects with certain shapes and colours on a canvas, by specifying motion, or by selecting example video clips from the screen. Specifying motion in a query can be useful when a user is looking for video clips where something is moving from one spatial location to another. In the VideoQ interface, for example, a user can draw a shape on the query canvas then draw the trajectory of the movement of the shape in the form of connected lines starting from the shape to other spatial location on the canvas. The DICEMAN query application (Dunlop and Mc Donald 2000) is an example of an elaborate query interface for digital video content where the user can compose various elements from pre-defined sets of elements and attributes such as persons, overall colour, movement, the order of objects appearing in the sequence, and so on.
2.1.3 Interfaces for Browsing Video Content
Directly browsing video content is a new concept but can be compared with a text retrieval interface feature for viewing abstracts or full-text. In video retrieval interfaces, what is known as "browsing video content" would be equivalent to Fast Forward/Rewind on a home VCR machine, or some selective keyframe browsing on a digital video library.
2.1.4 Interfaces for Watching Video Content
Watching or playback of video may or may not be the final goal of a user's video searching task, but in the context we have seen so far, playback is often the last stage of the user's interaction with the video library and interfaces supporting this are already common in the form of video player software such as RealNetworks RealPlayer (RealPlayer), Windows Media Player (Media Player) and Apple QuickTime Player (QuickTime). Basically, all these players have the same playback interface with buttons for play/pause/stop/rewind, volume control, and a timeline bar indicating the current point of playback in the context of the whole video clip. Digital recording set-top boxes such as TiVo (TiVo) and ReplayTV (ReplayTV) are also playback-centred and have remote control interfaces supporting playback features.
2.1.5 Interfaces for Re-Querying
In many information retrieval systems, the facility for re-querying based on the current interaction status has become an important element of the interaction as the user keeps on modifying his/her goals as well as information needs while interacting with the systems. In video library systems, early experimentation on systems such as WebSEEk (Smith 1996) and SWIM (Zhang et al. 1995) provided rudimentary interfaces for re-querying based on initial query results. More recently, the DICEMAN query application (Dunlop and Mc Donald 2000) provides a re-query interface that automatically composes a visual query on a query panel based on a user's selection of an example clip from the results set display, from which the user can further modify the query to refine his/her second or subsequent query.
2.2 Video Content Browsing - Some General Issues
There are currently only a handful of ideas proposed for browsing digital libraries of video content which allow browsing of different granularities or levels of abstraction of video using different styles, constructed either manually or automatically. This is because the technological developments needed to make video data processing on desktop computers feasible has only recently become available. There are now several problem areas which are worth considering.
2.2.1 Problems of Sequential Browsing
With conventional VCR machines such as at home, "browsing" the video content has so far meant simple Fast Forward/Rewind with a jog-shuttle controller or buttons. Most software video players also have the same kind of control functions with play/pause/stop buttons and a timeline bar indicating the current point of playing, similar to Figure 3.
Figure 3. Player control panel: Microsoft Media Player (Media Player)
The problems of fast forwarding and rewinding have been noted in several places in the literature (Boreczky et al. 2000, p186; Yeo and Yeung 1997, p44; Christel et al. 1997, p21; Taniguchi et al. 1995; Elliott and Davenport 1994; Arman et al. 1994, p97). Apart from the technical problem of transferring large video files across a networked environment, more fundamentally there is a problem of the sequential, single access-point, linear nature of the video medium when a user is trying to access the content. As such browsing will always be constrained by time, as we play faster and faster it becomes more and more difficult to recognise the content, with the cost of going back taking longer and longer. If a user is searching for a particular scene in a video, the user is faced with fast forwarding while struggling to concentrate on the fast-moving sequence. As the playing point moves further and further away from the starting point, the user becomes aware that if the particular scene is not found from the rest of the sequence, s/he has to go backwards even further to resume the search, and all this arises because the user can look at only one point in the video at a time.
The same kind of problem occurs when one browses an audio recording, blindly moving forwards and backwards with no indication or clue of the context. Attempts to make audio data browsable were explored in SpeechSkimmer (Arons 1997). SpeechSkimmer is a small, handheld device that plays speech recordings with various controlling mechanisms such as time-compression, pause shortening, emphasis detection and non-speech audio feedback. The fundamental problem we are examining here is that these media (audio and video) are time-based, making it difficult to browse within the data. An additional problem in video browsing is its multimedia nature, mixing rich visual and aural information. A rudimentary browsing facility is found in some films distributed in DVD format, allowing the viewer to start playback from one of the pre-selected scenes in the video. These manually created access schemes with graphically attractive interfaces still need more elaborate presentation and interaction design, but they can be regarded as a starting point for browsing interfaces in that they allow direct access to the various points in the video by providing visual summaries of each segment. With the enormous possibilities for manipulating data with the computing power available today, what we are confronting is the problem of design ideas for new methods of browsing video content, rather than the problem of manipulating the data computationally.
2.2.2 Browsing Video in Terms of the Level of Detail
Probably the best way of thinking about video browsing is in terms of various granularities (or levels of detail) of the browsable content. This is often called video abstraction because the object the user browses is something abstracted from the original video sequence, thus something less in amount, less time-consuming and requiring less effort to look at, which is the whole point of browsing.
Describing video abstractions in terms of their level of detail is useful in video browsing interface design. As Shneiderman (1998, p523) amply emphasises, the idea of "overview first, zoom and filter, then details on demand" is regarded as an important mantra for any field where information is displayed. In conventional bibliographic information search facilities since the mid-1990s, a search result often shows a single line of highly condensed representations (usually the title and a date of a document as an overview), and clicking one of them displays the abstract at the bottom of the screen (zoom). Further user initiation would open up a new screen showing the full text of the document (with detail on demand).
In video browsing interfaces, the Apple video magnifier (Mills et al. 1992) used hierarchically arranged frames taken from the video content, allowing a very coarse overview on the first row of the screen, then a level of further detail on the second row, then more details on the next row, all under user control by pointing with a mouse. Tonomura (1997, p197) also explains the levels of indication granularity for multimedia representations, starting from an overview of a simple poster, then a composite poster, to a viewer and magnifier where a certain part of the content can be viewed in detail.
The abstractions possible for video in its various levels of detail is examined by Christel et al. (1997), using as examples the Informedia Digital Video Library System's various video abstractions:
- title: a video document's name in text format
- poster frame: a single frame taken from a video document's content
- filmstrip: a set of frames taken from a video document's content
- skim: an assembly of significant bits of video sequences taken from the original video document
Various user-system interaction methods for each abstraction, and combinations of/transitions between abstractions are illustrated. The idea of different levels of detail will be usefully exploited in our work reported in this paper in the context of analysing the keyframe-based browsing interfaces (see section 3).
2.2.3 Further Video Content Browsing Ideas
How a sequential, time-based video sequence can be made into something browsable is an open question, and new ideas for achieving this are needed. Among the limited examples of creative ideas for video content browsing, Video Streamer (Elliott and Davenport 1994) is a set of adjacent frames of a video stacked one on top of another, showing the edges of the frames under the top one. Being able to see the continuity of the frames without seeing most of the contents in those frames is a good way of condensing the visual contents of video. This idea can be useful in short clip editing and browsing, and has been applied in an experimental SCR video browser (Arman et al. 1994). A comic book-style presentation (Boreczky et al. 2000) is another interesting idea, where automatically extracted representative frames are arranged and resized on a single page in a style similar to a page of a comic book. Other ideas on content browsing include various timeline representations (Aigrain et al. 1995) showing keyframes on the timelines in three dimensions to reduce the displayed space, automatic generation of a single or set of synthesised images containing video sequence information (Teodosio and Bender 1993; Kreyss et al. 1997), and other graphic representations of related scenes. Different video abstractions need different interaction mechanisms, and part of the difficulty in coming up with browsing ideas lies in the interplay between video content representation and the kind of interaction required for that representation.
Browsing video content through some kind of an abstraction can and should be aided by supporting information and other interaction styles. Ways of indicating the current time in the video can be also varied from a straightforward timeline bar as currently used in almost all video player software (such as RealPlayer (RealPlayer) and QuickTime Player (QuickTime)), to indicating small quantifiable objects (as in Arman et al. (1994)) in the video abstraction, to more intuitively visualising a time length as the depth of each browsing unit (as in Aigrain et al. (1995)). In browsing video content, users who have already seen the video before could benefit from having some form of time information available along with the content abstraction itself (this notion of particular support for particular users is found in more detail in Lee (2001)).
2.3 Keyframe-Based Video Browsing
A particular way in which video content is abstracted is to identify and extract a set of still images or keyframes from the video. This has the advantage that it can be done automatically. Video is composed of a sequence of still images, or frames, that are continuously changing at a speed of 25 frames per second, thus invoking an illusion of movement to a human eye. Much useful analysis can be done by using the data contained in individual frames and by applying image content analysis techniques and comparing the data between adjacent frames. For example, we can tell when a camera shot suddenly changes, or whether the scene has lots of motion or is stationary. Currently, most automatic video indexing systems are based on such frame-by-frame image comparison. Due to the dominant use of frame analysis in video indexing, it follows that the most common approach to content browsing interfaces in digital video systems (as seen from the Table 1) is to display selective frames on the browsing screen.
Frames that can represent video well are referred to as "keyframes", and can be used to turn the temporal content into static images that can be glanced. The effectiveness and usefulness of browsing such keyframes are evident in many places: a movie poster showing some snapshots of action scenes, a video or DVD tape cover showing interesting images from the video, Web-based video databases showing attractive still images taken from the movie, and so on. Once the video sequence in its temporal form is turned into a set of still images or keyframes, it becomes a matter of presenting them in different ways. Keyframe-based browsing is similar to the now de facto standard feature of "thumbnail browsing" in image retrieval interfaces, which show a number of small images before showing a larger single image if a user requests it. Indeed, it is difficult to find any image retrieval system that does not provide a thumbnail browsing interface: CD-ROM picture collections, museum kiosks, Internet art gallery sites and image search engines all have this thumbnail browsing feature, allowing the user to browse quickly through a large number of pictures efficiently. The efficiency of displaying keyframes from video is similar to the efficiency in image browsing interfaces, but there is one crucial difference: keyframes are more than a set of pictures because there is a clear idea of time progression among the keyframe images. A set of keyframes is a temporally ordered set of images and thus there is one more aspect to be considered in keyframe browsing when compared with picture browsing interfaces. This is illustrated in Figure 4.
When displaying keyframes, the important decision is which frames in the video, and how many of them, should be selected to represent the video content. The keyframe selection process can be done manually or automatically. There are various ways of selecting keyframes automatically, which can be summarised as follows:
- Subsampling: the simplest would be to select frames at regular intervals, for example taking one frame every minute. This will result in a set of keyframes evenly distributed throughout the video (see Figure 5), The selected keyframes might not show meaningful or pertinent visual content, however. For example, the selected keyframe might show an image when the camera was unfocused, or before a main character walks into the centre of the frame, or may show the same images across many keyframes due to a long camera shot, and so on.
- Automatic video segmentation: by segmenting video sequence into meaningful chunks and selecting keyframes from each segment, the resultant set of keyframes can be more visually meaningful than in subsampling. The dominant segmentation method at the moment is shot boundary detection, where individual camera shots taken in the video are detected, thus chunking the video into camera shots. This can be done by analysing individual frame contents and comparing with adjacent frames, frame-by-frame. During one camera shot the adjacent frame-by-frame differences are relatively small because they usually contain continuous, uninterrupted camera shots. When one camera shot ends and the next begins, there is a sudden large change between adjacent frames, and this can be detected. Once camera shot boundaries are detected, keyframes are selected based on this segmentation. The easiest way of selecting keyframes from shots is to take the first, last or middle frame from each of the detected camera shots, as is done in most currently available systems. The size of the keyframe set corresponds to the number of camera shots in the video (see Figure 6).
Figure 6. Selecting the first frame of each camera shot
Most of current systems take one keyframe per camera shot, but sometimes there is a need for more than one keyframe per shot (Taniguchi et al. 1995). Automatic selection of the "right" keyframe is an area worth much consideration. Some ideas and experiments concern "intelligent" keyframe selection - meaning any method other than selecting arbitrarily such as first, middle, or last frames - based on heuristic rules such as selecting a keyframe after camera motion in a shot stops (Uchihashi et al. 1999; Dimitrova et al. 1997; Aoki et al. 1996; Smith 1996; Wactlar et al. 1996; Wolf 1996; Zhang et al. 1995).
Whichever way the keyframes are selected, the result is a set of keyframes extracted from the original video content, and it then becomes a design issue of how to provide a user interface that allows the user to browse through this set of keyframes.
Keyframe-based browsing can be categorised in different ways - a detailed categorisation will be introduced in the next section - but to briefly illustrate the point, some common browsing interfaces are listed here:
- Storyboard: showing a set of miniaturised keyframes spatially on the screen in chronological order, allowing quick browsing (as in most currently available systems (Taniguchi et al. 1995; Virage VideoLogger). This is also termed "keyframe lists" or "filmstrips";
- Slide show: flipping a set of keyframes one by one on the screen (as studied and developed by Komlodi and Marchionini (1998), Smith (1996), DVB-VCR (1998));
- Hierarchically arranged browser: showing different levels of detail hierarchically, the user browsing keyframes in a drill-down manner, which is especially suitable when the content itself is a structured video such as news or magazine programmes (as done by Mills et al. (1992) and Zhang et al. (1995)).
What should become clear to the reader at this point is that there can be many different ways of presenting a set of keyframes in large and small numbers, spatially and otherwise. However, there is a need for some kind of structure or model within which new, and existing, keyframe video browsing techniques can be related. We clarified and classified the possible design of keyframe-based browsing interfaces by analysing their elements and possible design options, and in this way constructed a "design space". Keyframe browsing interfaces available in current digital video library systems can thus be understood and placed within this larger perspective. This can also provide a starting point for many different interfaces for different users and tasks, moving away from a single-interface that tries to accommodate all situations. A comprehensive list of systems and their interfaces using keyframes will appear as examples of particular features, and are introduced and located in the constructed design space in the following section. The development of such a model, or design space, is one of the main contributions of this paper.
3 Design Framework for Keyframe-Based Video Browsing
A set of video keyframes can be displayed in many different ways. Most of the keyframe-browsing styles developed by current systems, however, seem to come from arbitrary decisions and some knowledge on how previous systems looked. Some more systematic way of understanding the characteristics of browsing interfaces would help us see how existing interfaces differ from each other, and also how we could come up with new interfaces.
Another strong argument for developing a systematic design is that, in practice, a single all-encompassing video browsing interface which works well for all user tasks would be impossible and several variations on a browser theme are needed, each of which has to tackle a well-defined specification, such as potential exploitation of the video data, and prospective future users' requirements.
Exploiting the video data is a natural starting point for developing a framework. By way of analysis, we have identified different underlying dimensions of the video data and different values that each dimension can take. In other words, the video data we want to provide an interface to can be modelled in a multi-dimensional space where each dimension contains a certain number of values. "Designing an interface" in this context becomes a matter of choosing different values for each dimension, different choices resulting in different interfaces.
The data to be managed consists of sets of large or small numbers of keyframes in temporal order. Dealing with a number of keyframes, the first dimension we identified is "Layeredness" - whether to display all available keyframes directly to users or to allow for more selective presentation. The temporal nature of the data leads us to think of two more dimensions: "provision of temporal orientation", or how to make time information from the video visible to the user; and keyframe presentation style, which we call "spatial vs temporal presentation". Each of these dimensions is explained in this section.
3.1 Layeredness
Browsing interfaces for digital video library systems which are based on shot boundary detection need to display a very large number of keyframes. Most video systems take one keyframe from each of the detected camera shots, but others take more than one keyframe per shot (Taniguchi et al. (1995); EUROMEDIA (EUROMEDIA)).
TV programmes usually contain a large number of camera shots, e.g. a typical 30 minute soap opera contains about 300 to 400 camera shots. This means that even if we take one keyframe per shot, this makes a set of 300 to 400 keyframes to be browsed from the browsing interface. A two hour film can easily generate well over 1000 keyframes for the user to browse.
Browsing a large number of keyframes might not be an efficient method for navigating a digital video library, especially for users who need a concise view of the whole video content without having to spend too much time. Thus filtering and reducing the number of keyframes to be viewed should be considered in designing a video browser. Reducing the number of keyframes ("keyframe filtering") may vary, from simply selecting an arbitrary keyframe by grouping all keyframes within a regular interval, to more sophisticated ways of content-based and semantic-level keyframe grouping. Whatever the method of reducing the number of selected keyframes, this implies a layered presentation where several different levels of granularity can be chosen as an option, by the user.
At one end of the layeredness of keyframe-based presentations will be a single keyframe that represents the whole video clip, as is done for displaying search results in experimental systems such as Internet CNN NEWSROOM (Compton and Bosco 1995) and VISION (Li et al. 1996). One cannot expect this single keyframe to convey the content of the whole video, but it is a kind of icon for the video and a way of visualising one video clip within the library of clips. At the other end of the layeredness scale will be the full listing of keyframes with each representing a camera shot in the video, as in the Virage VideoLogger (Virage), Excalibur Screening Room (Excalibur) and MediaSite Publisher (MediaSite). While this full keyframes listing shows the most detailed view of the video content, it loses some of its purpose for browsing because it can take so long to look at all the keyframes.
Between these two extreme layers, we can think of intermediate layers with different levels of detail, i.e. different numbers of keyframes to be used as the user's index into the whole video programme. It will not be possible to say which level of granularity is best for every situation as one user in a situation will have different needs from another user - a busy user might want to see the entire video content condensed to just one screen of small keyframes, while a user looking for a particular scene might want to browse in as detailed a way as possible and may thus be willing to spend more time on the task. The important elements to consider in providing layered keyframe presentations are the number of different layers available in an interface and the navigational link between these different layers while the user browses the keyframes.
In looking at current systems, our initial set of options have been based on whether there is more than a single layer provided, and if so, whether there is a navigational link provided between these layers. Having an explicit link means that in the interface the user can browse from one point in a layer to the same point in a different layer, thus maintaining the current browsing point in the video while jumping between different layers. Our choice can be visualised as Figure 7, which will later become one of the axes in our final design space.
3.2 Provision of Temporal Orientation
Since video has a temporal nature as one of its fundamental characteristics, it is important to consider providing (or not providing) a cue for this sense of time during user browsing. A set of keyframes usually provides no proper temporal orientation other than a rough progression from one keyframe to the next. Looking at keyframes selected based on camera shots can even distort and mislead the time orientation, as the number of shots and thus keyframes from one time segment can be very different from those in another segment of the video of the same length, depending upon the frequency of camera shot cuts in the video content. A familiar time orientation used in many interfaces is known as the "timeline bar" (either as an indicator or controller) and is usually found in most video player software applications such as Microsoft's Media Player or RealNetworks RealPlayer.
In keyframe browsing, it is easy for the user to lose a sense of time and at what point s/he is currently viewing the video. Ways of indicating this sense of time are an important dimension in browsing. Several examples of this implementation are found, such as simply time-stamping the numeric time in each keyframe (MediaArchive (EUROMEDIA), Virage VideoLogger (Virage), Screening Room (Excalibur)), displaying quantifiable objects to visualise shot length (Arman et al. 1994) or a timeline bar indicating the current position of the user's keyframe browsing (MediaSite Publisher (MediaSite)).
As before, we identified a set of values in this dimension, that is, a set of possible design options:
- Indication of absolute time in the currently viewed keyframe or set of keyframes (exactly how far into the video the current keyframe is)
- Relative time - an indication of the current browsing point in relation to the whole length of the video
- No indication of time (as a deliberate decision not to)
The above ideas can be visualised as Figure 8, to be used later in constructing our design space.
3.3 Spatial vs Temporal Presentation
The conventional keyframe browsing idea would be to miniaturise each keyframe into a small thumbnail image and scatter these spatially on the screen, in much the same way as image retrieval interfaces used in art gallery kiosks or CD-ROM picture collections. It is also possible to present keyframes on a user's screen one by one, the system automatically, or the user manually, flipping through the keyframes. Considering the fact that the original video is time-based and sequential, it makes sense to present keyframes temporally, and this might even provide the feeling of watching the video.
Current digital library system interfaces based on keyframes mostly show thumbnail-size keyframes spatially such as TV Ram (Taniguchi et al. 1995), FRANK (Simpson-Young and Yap 1996), MediaArchive (EUROMEDIA), VICAR (VICAR), MediaSite Publisher (MediaSite), Excalibur Screening Room (Excalibur), Virage VideoLogger (Virage), AT&TV (Mills et al. 2000). Systems presenting keyframes temporally, such as WebSEEk (Smith 1996), DVB-VCR (DVB-VCR 1998), and the movie browser tool (Marchionini et al. 1998), are found less frequently.
An interesting video browsing study comparing a spatial presentation with a temporal presentation in terms of different user tasks, has reported that spatial presentation was better for locating and identifying a particular object in the video, whereas temporal presentation was better for getting the gist of the video (Tse et al. 1999). It is reasonable to assume that one particular keyframe presentation method is more suitable for a particular kind of task than others, and it might also be the case with particular users' personal preferences.
Again, this dimension can be visualised as Figure 9.
3.4 Constructing a Design Space
So far three dimensions of our keyframe-based browsing interface design space have been established, and can be used as a decision-making tool. Assembling these dimensions leads our design space, as in Figure 10. A location in this space determines three values taken from each of the dimensions that make one specific browser interface. In this way "designing a browser" becomes simply a matter of pointing to one location in the design space, turning the interface design process into a concrete, simple and systematic decision-making process. A visible set of possible design space values, rather than a specific design, is thus constructed, providing an interface designer with all sorts of design ideas and a way of clearly comparing different design possibilities.
The positions of several existing video browsing interfaces can be located in this space, e.g. the SWIM hierarchical browser (Zhang et al. 1995), DVB-VCR (DVB-VCR 1998) and AT&TV (Mills et al. 2000) (see red dots in Figure 10).
The power of the constructed design space is that it shows different browsing interfaces more globally in their relative positions in terms of three dimensions, as well as revealing possible new interfaces. All three existing interfaces located in Figure 10 are in fact choices among alternative ways of designing interfaces.
The particular design space constructed here is based on analysis of keyframe sets from video, and in analysing the presentation values and options, we were strongly supported by the review of features that are currently available in various systems. The resultant framework would be most useful in describing an interface that is available today, and in comparing their features. Assuming that this descriptive space can be equally useful for designing a new interface that has different combinations of features, the next section presents some of the browsers we have designed and implemented using the space. We will focus on three distinctive browsers - Timeline browser, Slide Show browser and Dynamic Overview browser - and present their locations in the design space, some screen shots, and the analysis of user testing we have conducted.
4 Some Design Examples
Using the design space specified above, we have designed and implemented a number of different browsers. These have been put through a series of user tests to discover usability problems introduced in the design/implementation stage. Once the initial browser interfaces were stabilised, we conducted more elaborate usability tests to capture what users think of the browsers and to get further insights into the browser design. The test used an automated, Web-based suite that asks the user to find particular scenes in video clips using different browsers. Clicking on any of the presented keyframes opened video player software and started playing from that moment. After the task was completed, users graded and typed in their comments. For a full description of this user testing, see Lee (2001).
Some of the selected browsers have been integrated into the larger video library application called Físchlár, which allows recording and browsing of broadcast TV programmes on the Web. Integrated browsers are again tested within the larger context of the overall usage of this system, as described in section 5.
Here we present three of the browsers based on the design space. These browsers have gone through many informal and formal user tests. Each browser is briefly described by its exact point within the design space, a screen shot of the browser, and some analysis from the user testing results.
4.1 Timeline browser
This browser shows two different layers: an overview layer with 32 keyframes selected from throughout the video, providing an overview at a glance; and a detailed layer with full shot-level keyframes (i.e. one keyframe from one shot) organised by timeline (see Figure 11). To change between two layers, a user clicks the buttons at the top. In the detailed view, the user initially sees the first 24 keyframes with that part of the timeline highlighted in bright yellow. The user can bring the mouse cursor to any part of the timeline, which is mouse-over activated immediately to replace the keyframes below with the ones from current mouse position on the timeline. When the user stops moving the mouse cursor, a ToolTip box pops up on the current position indicating the absolute time of that segment (e.g. "20min 24sec - 23min 50sec").
Figure 11. Design space and Timeline browser (full-size image)
User testing on this browser revealed several useful points, both minor usability problems and ideas for improved design space analysis. For example, the usefulness of the overview page was frequently mentioned, while some complained about the small size of keyframes in the overview. More importantly, the fact that the browser had no navigational link between the two layers was discussed quite intently in user comments and in after-session discussions. Throughout the testing it was noted that people wanted to be able to relate keyframes in the overview to the ones in the detail, checking if each keyframe in the overview corresponds to each segment in the detailed view. What was highlighted by the testing was that those users who have seen the test video mentioned their wish to have some kind of link between the layers, though this was expressed in different ways:
"...but it might be better if clicking on a picture opened up the detailed view rather than [playing] an actual clip." (Test User 11)
"It would be great to be able to select a number of adjacent segments... which can be expanded into the detail view." (Test User 12)
"Clicking on an image when the browser is in the overview state should maybe bring up the section of the overview that that image is from, rather than the video player." (Test User 4)
..."the problem is that when clicking on image there was no indication of which group it belonged to, I had to search through the groups until the area came up... while the overview is a great idea, it is ruined by not expanding down in detail" (Test User 2)
"Might be nice to expand from overview to detailed by clicking on a scene to expand" (Test User 13)
"If the images in the overview mode were used to open the corresponding section in the Detailed View, this would be very beneficial for finding your chosen scene." (Test User 4)
The phrases used in the users' comments above - "open up detailed view", "expand", "bring up the section", "showing corresponding detail view" - all reflect the same expectation of users wishing to be able to navigate between the layers while maintaining the current point of browsing, i.e. a navigational link between the layers. This might indicate that multiple layers with a navigational link is an additional feature that is more or less always more useful than not having it. The design space framework currently has no better-or-worse points, just different options for different usage and user preferences.
4.2 Slide Show browser
This browser automatically flips through large-size keyframes one by one as in a slide show (see Figure 12). While flipping through the keyframes, a small timeline under the keyframe indicates the current point of the keyframe in relation to the length of the whole video. Temporally presenting a set of keyframes is quite a different method than the way most digital video systems present keyframes spatially.
Figure 12. Design space and Slide Show browser (full-size image)
Users' opinions of the Slide Show browser were very low. The definite and obvious reason was its constantly changing keyframes. Whereas this allows less user interaction with less mouse clicking, having no user-based control over the pace of keyframe flipping was considered as a definite problem. As some of the users' comments show:
"...speed of the frame change was too fast for me... some kind of system which would allow me to control the rate of frame change." (Test User 9)
"...as it relies on reaction time you are more likely to click in the neighbourhood..." (Test User 13)
"The constant movement is unrestful and distracting to the eye. An improvement in my view would be to have the browser move frames only when the mouse button is clicked/held down over it." (Test User 15)
A temporal presentation automatically flipping through keyframes, although expected to let the user watch passively without requiring frequent interaction, seems to be more of a disadvantage. This could be because of the nature of the task given in this user testing - searching for a particular scene. If the given task was instead to simply get the overall gist of the video content, it might require less control of flipping and thus test users might have given less negative feedback on the browser. This illustrates that a user interface design should be concerned not only with users' personal preferences but also the task specification.
4.3 Dynamic Overview browser
In this browser interface the screen displays spatially a small number of keyframes selected from throughout the video content, providing an overview (see Figure 13). When a user brings the mouse cursor to any of these keyframes, the one pointed at will start flipping through a number of adjacent and subsequent keyframes in that segment, temporally presenting more detailed keyframes within that part of the video. While flipping through these keyframes, a small timeline appears below the keyframe indicating the current point of browsing. This browser is an interesting combination of spatial and temporal presentation for two separate layers (that is, overview spatially, detail temporally).
Figure 13. Design space and Dynamic Overview browser (full-size image)
Assessing this browser with users showed that while it is highly praised for its novel interaction and usefulness, such as:
"...the browser which looks like an overview of the whole program - but each scene actually moves when you hold the mouse over it - is excellent, it's my favourite by far." (Test User 15)
it was commented negatively on being unable to control the flipping speed or pace of the keyframes. This is basically the same problem mentioned by test users who used the Slide Show (see above), but it is less negatively commented on in this browser, possibly because the feeling of control is present to a certain degree as the flipping of keyframes can be stopped and started by positioning the mouse cursor over the keyframe, and also because the length of the flipping loop in each segment is usually shorter than in the Slide Show. Once a keyframe is passed, the user can wait for a while until the next loop comes.
"...picture flips maybe a bit too quickly and it is easier to get lost than the other browsers" (Test User 3)
"The interface took over and I had to passively (and not too passively because if I binked I'd miss it) watch for the scene which is hard to recognise... more seeking [mechanism needs to be] supported in that interface " (Test User 1)
"...if the user loses a picture, it has to wait for the next occurence..." (Test User 7)
One test user mentioned this as a positive point that the automatic flipping saved clicking, which was partly the rationale for presenting keyframes temporally:
"I like the way that on this browser it skipped through the images - so there was no need for extra & annoying clicking." (Test User 20)
4.4 Analysis of User Testing Results
Feedback from user testing presented above consists of a set of selective comments chosen to illustrate the main points from the experiments. The usability testing conducted throughout the development of the browsers has raised many interesting and useful insights. Considering the early stage of the browser design and the less defined nature of the tasks and purposes, usability testing was qualitative, with the emphasis on getting test users' comments in order to understand various aspects of the browser interaction, rather than trying to come up with quantitative values based on specific task performance measures. As the browsers become more refined and are incorporated into focused applications, more quantitative evaluation will be conducted with specific measures and experimental settings, and this will be carried out in a framework such as the TREC video track (Video TREC 2001).
Overall thematic results from the analysis of usability testing conducted throughout the development of these browsers, with comments from formal testing conducted with 20 test users in November 2001, are summarised here. These tests used as many as eight different browsers designed from the design space, including the three introduced in the previous section (for details on all of these browsers, see Lee (2001)). The main thematic points to emerge from this testing are now presented.
The Entertaining Nature of Video Browsing
We felt that the video browsers engage the test users into the task well, as the data displayed are images taken from videos such as a movie, and users are often interested in such entertainment. Sometimes it was easy to identify usability problems when the user was not engaged in the browsing task and rather fiddling with the widgets on the browser. In most browsers that display a large amount of information spatially rather than with complicated widgets and indicators, the information (keyframes) often worked as an interaction trigger. Several users wanted to watch more of the test video material after completing the given task. The ultimate goal of user testing and consequent refinement is to develop the browsers (and any other user interface elements) to such an extent that the user can concentrate on the browsing task and forget the interface handling side. As Norman (1988, p180) says, "the best computer programs are the ones in which the computer itself disappears, in which you work directly on the problem without having to be aware of the computer".
Having Different Browser Features as Options, or as an Integrated Single Browser
Some people saw the potential usefulness of the browser features, and suggested the use of several different browsers together as options. Also, some test users suggested combining the good points of different browsers into a single browser.
As more comments were received from test users, it became clear that features can be liked by some and disliked by others at the same time, and there does not seem to be one single feature that is liked by everybody. For example, automatic flipping of keyframes in the Dynamic Overview was condemned by many for its uncontrollable pace of flipping, but liked by others for not requiring annoying mouse clicking! Many conflicting comments from different users tell us that opting for one particular browser is not the best way of interface provision, rather that we should provide a way to switch between different features flexibly. It can be said that being able to select the right interface for the right user in the right situation is the key to a successful strategy. Being able to customise the interface, the provision of options, and ultimately the intelligent provision of the appropriate interface to the user performing a particular task, would be an ideal.
More Smooth Browsing-Playback Transition
The idea was suggested of making the browsers more closely related to the video player in terms of user interaction. When considering video browsing in the context of the users' whole experience, browsing and playback should not really be separate, independent interactions even though in our design framework throughout this article we have dealt with browsing interaction as quite separate from playback. The user browses keyframes and plays at an interesting point, then wants to go back to browsing, then play again, and so on. A more complete interface that helps the user browse the video content would blur the boundary between browsing and playback. It can be said that playback should be understood as one part of browsing. The idea of a more integrated browser-player especially in terms of synchronising browsing with playback is also discussed in Simpson-Young and Yap (1996) and Mc Donald et al. (2001b).
Initial Difficulty in Understanding How to Use
The designed browsers are not particularly targeted at first time, infrequent users, but still the initial learning time is something worth considering and was noted for the different browsers. Browsers that use a timeline bar or the simple Slide Show browser were relatively easily understood in terms of how to use them (timeline is something which people normally relate to when playing video, and is a familiar concept). Novel browsing interfaces such as the Dynamic Overview browser (and others not introduced here), and the idea of overview/detail were often deemed confusing when first exposed to users.
Questions on How Well Keyframes Represent Content
In initial informal demonstrations and discussions, as new browsers were being implemented, people were often curious about how the keyframes were selected. A brief explanation on automatically detecting camera shots and selecting one frame from each camera shot was given, and people were impressed by the "smart" technique which the system uses. However, the way keyframes are selected, and especially the way selected keyframes are arranged in multiple layer browsers, are sometimes questioned.
It is not obvious how we can measure the 'correctness' of the right choice of keyframes and the right structuring. This is especially true in multiple layer browsers as the way groupings are done for a more concise keyframe set on the higher layers is not currently based on any standard technique. Currently the mechanism that automatically selects keyframes for our browsers does not contain any elaborate scene grouping method that does this intelligently. Although this is more of a technical concern (thus not directly our concern here), being able to display truly representative keyframes is important and influential to users' browsing behaviour and must be investigated in digital video library research.
Besides carrying out a series of user tests on the three browsers introduced earlier in this paper, we have also incorporated these, plus five others derived from our design space, into a digital video library system which we developed, called Físchlár. This is described in the next section.
5 Físchlár - Video Browsers in Use
The Web-based Físchlár system allows users to record from today's and tomorrow's broadcast TV programmes by browsing an online TV schedule (see Figure 14). As they are transmitted, selected programmes are digitally recorded and analysed for shot boundary detection and keyframe selection. Processed programmes can then be browsed with keyframes presented using various browsers (see Figure 15), and clicking on any of the keyframes will pop up a separate window and play the video from that point onwards, using video streaming technology. The user can select one of the eight browsers provided to browse keyframes in a programme. The system has been deployed within the University campus (computer labs and residences) since early 2000 and is capable of streaming to over 200 users simultaneously.
Figure 14. Físchlár's user interface: recording (full-size image)
Figure 15. Físchlár's user interface: browsing/playback (full-size image)
The system's functionality (recording and browsing/playing TV programmes), which is similar to that of a home VCR, has made it popular within the campus community and at the time of writing the system has more than 1500 registered users, actively using the system to record the TV programmes they want and watching programmes recorded by themselves or by other users. To give a rough idea of its usage, logging data shows that during a five-week period at the beginning of an academic semester (24 September - 31 October 2001), the system had:
- 251 new registered users;
- 792 TV programmes recorded and analysed;
- 431 registered users who accessed the system at least once.
A more detailed description of the Físchlár system can be found in Lee et al. 2000, O'Connor et al. 2001, Smeaton et al. 2001.
This communal recording system with a relatively large number of users has been a valuable test ground for user interface development, allowing us to obtain feedback and usage patterns from the users. Since the system was initially deployed, we have been logging all user interaction to discover usage patterns for various features. We have also been selectively interviewing our users to find out what and why they do what they do with the system. More in-depth interviews and observations will be conducted in due course.
Observing usage patterns has been an important mechanism for study of our browsers. Our early analysis of logging data (Mc Donald et al. 2001a) showed that the default browser, the Timeline Bar, is the browser which the majority of users depend upon for browsing video content, whereas other browsers are used by only a small number of highly dedicated, frequent users. Different people in different contexts will have different preferences for various browsers, and an ideal system would be the one that can flexibly allow users to use whichever browsing style they want at a particular time.
We are working on personalisation of the features of the Físchlár system, starting with automatic TV programme recommendation on an individual basis by incorporating the ClixSmart personalisation engine from PTV (Smyth and Cotter 2000). The engine uses community-based inference by calling an individual user's history of use and like-dislike indications (see the thumb indication icons on the right side of the screen in Figure 14), to come up with programmes s/he will likely find interesting, thus allowing the user to avoid having to tediously go through all TV channels one by one searching for programmes to record or play. In future this personalisation could be applied to a particular browser provision for individual users.
We have found there are three common forms for use of the Físchlár system within the campus:
- In computer labs: students working on projects in the evening use the Físchlár system during their breaks. 20-30 minutes are spent watching of popular comedy programmes such as The Simpsons and Friends are common, before swapping back to their project windows;
- In campus residences: students (undergraduate and postgraduate) and some staff use their PC in their bedroom to watch recorded programmes in full-screen mode;
- A tailored version of the system called Físchlár-News is used to record the main evening TV news bulletin from the Irish national broadcaster, and has created a growing archive of over nine months of news broadcasts. This system is used throughout the campus, from the University library to the residences, and is used by staff and students for research and study of news material.
As the Físchlár system becomes more of an everyday activity for our campus users, and becomes a "living laboratory" (Abowd et al. 2000, p47) for video technology, continuous assessment of various angles and feedback from our campus-wide users has become the main source for further refinement of the browsers and a deeper understanding of the design of video browser interfaces. Our system logging analysis results in Mc Donald et al. (2001a), and subsequent interviews with selective users, showed:
- The majority of users only watch (browse/playback), and do not record any programmes; only a very small number of frequent users keep on recording programmes;
- Users find it awkward to swap between the browser interface and player software, and so end up just playing even if playback starts where they do not want (e.g. advertisement) - as a result, users want to have some kind of browsing feature within the player, rather than on a separate browser screen;
- Users have no clear idea where the personalised recommendations comes from, and want to know how it is decided;
- Users want to be able to watch live broadcast TV, as well as watching recorded programmes from the digital video library.
With both current and past interfaces, numerous small and large iterative refinements on the keyframe browsers have been completed and integrated into the operational Físchlár library system. As our users become more familiar with the system and find their own ways of making use of it and its various features for their own benefits in their everyday life, more ideas and changes will be suggested and reflected in the interface. The evolving face of the Físchlár system has become the testbed for our long-term research on navigation within digital video libraries.
6 Conclusions
In most cases the process of designing a new, novel interface for multimedia applications such as browsing digital video libraries is still arbitrary and unpredictable, as in many other fields that deal with human action and interaction. While we need a better way of understanding and clearly specifying how interactions in various stages occur, we also need to verify and evaluate our initial designs.
Our video browser design framework demonstrates an example case for adopting an analytic approach to interface design (MacLean et al. 1989; Stary 2000; Calvary et al. 2000) specifically applied to keyframe-based video browser design, in which important elements of possible design features are specified and selected. Continuous user input during the development of the browsers provided us with many important ideas to consider in future designs and improvements on the current design. Unlike many experimental prototype systems that remain in a computer lab until the end of the project, we have created a long-term, real-usage environment, the Físchlár system, which is robust enough to be used by a large number of users reliably in their everyday lives to keep us informed on the usage of the browsers in this environment. While thinking about how new technology and resultant features from it could be integrated into the system to enhance user interaction, we are continuously discovering usage patterns and reflecting the most important of these into newer versions of the system interface.