2001 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2001 Table of Contents


Recent Developments with Digital Talking books at NLS

Michael Moodie
Research and Development Officer
National Library Service for the blind and physically Handicapped, Library of Congress
Washington, DC 20542
E-mail: mmoo@loc.gov

Introduction

The National Library Service for the blind and physically Handicapped (NLS), Library of Congress produces approximately 2000 recorded books annually in about 1000 copies each, and distributes them on 4-track 15/16 ips cassettes through a network of 140 regional and subregional libraries across the United States. The network libraries provide direct service to an eligible readership of 764,000. Over forty audio magazines are mailed directly to subscribers. NLS manufactures and supplies to readers compatible cassette players. The entire program is free of charge for eligible readers, thanks to funding provided by Congress to NLS and by local, state, and federal sources to the network.

In July, 1998, NLS published a document entitled Digital Talking books, planning for the Future. It can be found at http://www.loc.gov/nls/dtb.html. It lays out a twenty-step plan that NLS will follow to migrate from its current analog cassette system to a digital audio system. We made some minor adjustments to the plan internally in late 2000, but the basic process is unchanged. This paper will review progress on most of the first eight tasks where NLS has been focusing significant effort, with greatest emphasis on the first, the development of a standard for digital talking books.

Planning for the Future

For ease of comparison with the 1998 document, the task numbers and descriptions included here are drawn directly from the original document.

Task 1: Define and prioritize digital talking-book (DTb) features

NLS chose to use the standards process to accomplish this work, as it allows full input from the widest possible range of stakeholders and ensures that their concerns are considered. NLS began work on the standard in May of 1997, under the auspices of the National Information Standards Organization (NISO), bringing together a large group of experts in the fields of disabilities and technology. NISO created Committee AQ -- performance Specifications for the Digital Talking book -- to undertake this work. A full list of committee members can be found on the NISO web site at http://www.niso.org/commitaq.html. It is important to note that representatives from the DAISY Consortium, which has made enormous contributions to the development of DTbs, have been part of the NISO DTb committee since its inception.

The committee began its work by developing two requirements documents, focused on playback devices and document navigation respectively. The first, prioritized List of Features for Digital Talking book playback Devices (www.loc.gov/nls/niso/features.htm), details the many features users would like in a DTb player. These include such items as variable speed with pitch restoration, FM-quality sound, immediate interruption of audio messages when any key is pressed, and the ability to read a text file, if present, through a connected braille display. This features list will not be a part of the standard itself but will serve as a set of guidelines to agencies designing DTb players.

The second document is the foundation on which the standard is based. The Document Navigation Features List (www.loc.gov/nls/niso/navigation.htm) describes in detail the capabilities desired by future users of DTbs. It is deliberately comprehensive to ensure that the file specifications which form the core of the standard allow for the implementation of a very rich set of features. Not all players (or all books for that matter) will support every feature mentioned in the features list, but the standard will describe how each should be implemented, ensuring interoperability among players and books produced by different manufacturers and producers.

A digital talking book (DTb) is a collection of electronic files arranged to present information to the target population via alternative media, namely, human or synthetic speech, refreshable braille, or visual display, e.g., large print. When these files are created and assembled into a DTb in accordance with the standard, they make possible a wide range of features such as rapid, flexible navigation; bookmarking and highlighting; keyword searching; spelling of words on demand; and user control over the presentation of selected items (e.g., footnotes, page numbers, etc.) The content of DTbs will range from audio alone, through audio, text, and images (for use by visually impaired or dyslexic readers), to text alone.

DTb players will also take a variety of shapes. The simplest might be portable devices with audio-only capabilities. More complex portable players could include text-to-speech capabilities as well as audio output for recorded human speech. The most comprehensive playback systems are expected to be pC-based, supporting visual and audio output, text-to-speech capability, and output to a braille display.

The standard itself will consist primarily of file specifications, that is, detailed descriptions of how each of the key files that make up a DTb is formatted. This is not the venue for describing those specifications; rather we will list the critical files and briefly describe the function of each.

Package File

The package File contains administrative information about the DTb and the files that comprise it. One of the key components of this file is the "metadata" it contains, which serves as a sort of library catalog record to enable readers to locate a DTb in a networked environment. The package File is drawn from a specification created by the Open ebook Forum (OEbF), (http://www.openebook.org) an organization formed to create and maintain standards and promote the successful adoption of electronic books. Groups involved in the NISO DTb standards effort are working with the OEbF to solve common problems as both initiatives face similar issues.

Document Text File

A DTb may contain part or all of the text of the document, as an Extensible Markup Language (XML) file marked up in accordance with a set of tags defined for the standard. The document text file enables a playback device to spell words on demand, carry out keyword searches, and permit finely-grained navigation, such as moving cell by cell in a table, moving through nested lists at various depths, or stepping through a poem line by line. The text file may also be accessed directly via refreshable braille display, synthetic speech, or screen-enlarging software.

Audio Files

A DTb will normally include human-speech recordings of the document, embodied in audio files encoded in one of a specified group of audio formats. The standard allows the use of a small number of formats such as Mp3 and WAVE.

Synchronization Files

To synchronize the different media files during playback, the standard uses the World Wide Web Consortium's (W3C) Synchronized Multimedia Integration Language (SMIL). The DTb SMIL files define a sequence of "media events." During each event, text elements and corresponding audio clips as well as any additional visual elements are presented simultaneously. For dyslexic readers, this would mean that the text displayed on the screen would be synchronized to the audio presentation. For readers using a braille display connected to their pC, the braille output would follow with the audio rendition.

Navigation Control File

The DTb system supports two modes of navigation, global and local. Global navigation -- movement by structural element (chapter, section, subsection)-- is effected through the Navigation Control file for XML applications (NCX). The NCX presents a dynamic view of the document's hierarchical structure, allowing the user to move through the document in large steps corresponding to its major divisions, or in progressively smaller steps down to a limit set by the level of detail to which the document has been marked up. Text and audio elements present to the user the document's headings, so the reader experiences movement through the NCX as a sort of dynamic table of contents. Local (more finely-grained) navigation is not handled by the NCX but is enabled through the document text file or through time-based movement through the audio presentation, depending on the document and on the player.

Bookmark/Highlight File

The standard supports user-set, exportable bookmarks and highlights to which text and audio labels or notes may be applied. The reader can thus set a large number of bookmarks or highlights, label them by keying in text or recording a short audio clip, and export the resulting file to another device. The standard specifies how such a file would be formatted and identified to ensure it would be usable on any standards-compliant device.

Resource File

The resource file contains various text segments, audio clips, and/or images that provide alternative representations of navigational information -- feedback on the user's current location in the document. It supplies information normally presented in a print book via typographical clues. For example, in a book where subsections are clearly indicated, but only by a larger type face on the first line, the resource file could supply the word "subsection" in text and audio at approproiate places.

Digital Rights Management

"Digital Rights Management" is the term used to describe techniques for protecting the intellectual property embodied in the works presented in digital form. This area is currently in a state of extreme ferment, with no approach clearly superior. At this writing, the standards committee has not identified a technology that will provide the requisite protection without unduly burdening the user. The NISO DTb committee is committed, however, to finding a solution that adequately protects the rights of intellectual property holders.

At this writing (October, 2000), the standard has passed through several internal drafts and has been released for the first time to interested parties outside the disabilities community. Further drafts will be prepared for review in the coming months before the document is readied for standards status. More details on the components of the standard and how they interrelate can be found in an NLS-authored paper at http://www.rit.edu/~easi/itd/itdv07n1/article3.htm.

Task 2: Simulate a DTb using a personal computer

Work on this task is discussed in detail in a companion paper by John Cookson, Tom McLaughlin, and Lloyd Rasmussen, entitled Digital Talking books: Developing a User Interface.

Task 3: Develop a computer-based life-cycle cost-analysis tool for the NLS system and candidate digital systems

NLS completed development of a life-cycle cost tool in September, 2000. This tool models the full talking book system, including production of audio books and magazines by NLS and circulation through the network of cooperating libraries. Its first use will be to gather annual data on the cost of the current system, allowing NLS to track cost trends. It will then be used to develop reasoned estimates of the cost of alternative delivery systems, such as CD-ROM, internet delivery, or solid-state memory devices. The detailed analyses that the tool permits will be critical in making thoughtful comparisons between different candidate systems and in demonstrating to funding agencies the rationale behind requests for large budget increases.

Tasks 4 and 5 are not yet significantly advanced to merit discussion here.

Task 6: Select an acceptable copyright protection system

As mentioned above, NLS has been investigating alternative copyright protection (Digital Rights Management) schemes through the NISO DTb effort but has not yet selected an approach.

Task 7: Design or select digital mastering and playback systems

NLS has begun experimenting in its in-house studios with digital mastering systems. A DTb, as mentioned above, is far more than just a file of recorded human speech. DTb production tools must therefore incorporate user-friendly methods for creating the other files that make up a DTb. NLS is developing some tools itself to meet its unique needs and is evaluating tools from other sources to determine their applicability to the NLS environment, which ranges from professional studios producing the bulk of NLS titles each year to volunteer groups that record materials of local interest at regional libraries in the NLS network. To ensure that NLS has an adequate store of digital materials available when it begins its transition from analog to digital distribution, all contractors producing for NLS have been given a schedule for converting their studios, culminating in 100% digital mastering in fiscal year 2004. In the area of playback systems, NLS is beginning development of one of the most critical components -- the user interface -- as mentioned under task 2. Further development will await identification of the delivery system and will be based heavily on the playback device features list.

Task 8: Examine distribution methods from a systems perspective, focusing on cost and convenience

NLS has begun this process with a preliminary evaluation of CD-ROM as a delivery medium. Costs are being gathered for incorporation in the life-cycle cost tool and discussions are underway regarding the convenience of this medium for NLS patrons. It is no doubt suitable for younger readers who regularly use CDs for music consumption. However, concerns remain about its viability among older users, who form the majority of NLS patrons, given the high degree of manual dexterity required to handle CDs without damaging them.

Conclusion

NLS has been laying the foundation for a richly-featured digital talking book, in cooperation with a number of partners, through the development of the NISO DTb standard. Simultaneously, it has begun to develop key components of a digital talking book system that will replace the analog cassette system in use at its producers and throughout its network of libraries.


Go to previous article 
Go to next article 
Return to 2001 Table of Contents 
Return to Table of proceedings


Reprinted with author(s) permission. Author(s) retain copyright.