1998 Conference Proceedings

Go to previous article. 
Go to next article. 
Return to 1998 Conference Table of Contents


FEATURES AND BENEFITS FOR THE INTERFACE OF THE NEXT GENERATION DIGITAL TALKING BOOK

Dennis P. DeVendra
Recording for the Blind & Dyslexic

1.0 Overview of a Digital Talking Book (DTB)

1.1 The DTB consists of several components to provide the capability to deliver a book in many formats with many navigation capabilities. The objective is to provide a book reader using a DTB with the same navigation capabilities, as a sighted reader would have with a print book while maintaining a high quality recorded audio experience.

1.2 The DAISY Consortium (Digital Audio Information System) consists of a group of libraries from around the world that serve print disabled people. The Consortium is creating standards for the DTB, so DAISY compliant books will allow for worldwide usage. As a member of The DAISY Consortium, Recording for the Blind and Dyslexic is working to help set standards for the DTB, which will be used by all libraries for print disabilities. The DTB's will also need to be a part of current international open standards for creating text and audio recording files, which will prevent obsolescence in this fast-paced technical world we live in.

1.3 Currently, there are two methods for reading a book for those who have a print disability: analog tape and a computer readable electronic text file. Combining the benefits of each format will provide a better book.

1.3.1 If a book is provided in an analog tape format, the quality of the recording is usually understandable. The playback devices are light and very portable. The limitation of an analog tape is the inability to navigate to a specific point in the book quickly and efficiently. Using tones placed on the tape a reader must count the tones while fast-forwarding or reversing a tape.

1.3.2 With a computer readable electronic text file, a reader has many navigation capabilities. The reader can move from chapter to chapter, page to page, or search and begin reading at the desired point in the book. At any point words can be spelled for clarification or understanding. The disadvantages are to listen to the book you need a computer and a screen reading system. Most readers will use some kind of screen reader software with synthesized voice or refreshable Braille. This works, but the synthesized voice can be difficult to understand and often tiring over a long period. The required computer equipment is rarely portable making it difficult for those wishing to transport books.

1.4 the DTB will provide navigation capabilities similar to the electronic file. The text, if provided, can be read or spelled with synthesized voice or a refreshable Braille device. Also having text displayed may provide those with learning disabilitiesthe capability to follow along with the recorded voice. The audio recording will be in a human voice providing maximum comprehension and a reduction in fatigue.

2.0 Creation considerations

2.1 The recorded audio is just one component of the raw materials, which make up the DTB. The details of the components of a DTB is outside the scope of this presentation, but a brief overview of these components will help with understanding the output capabilities discussed later.

2.2 Whatever files are used to make up the DTB, they need to be within a larger international standard. An example of this would be a WAV sound file, which can be used by anyone with a computer and a sound card. If we go towards an attractive proprietary form, which has a limited use by the world, it will not take long for obsolescence to set in. This will result in high cost for playback devices and limit quality enhancements in the future.

2.2.1 The most obvious component of the DTB is the sound recording file. This file can be created in many popular forms, including CD-ROM's which many computer and home sound system users have experienced. The objective is to create a master archive consisting of a good quality sound recording, which can be used or converted to other formats for playback purposes. A book recorded with a WAV sound file is usually too large to transmit over the Internet or put on a single CD-ROM. There are sound compression formats, which can shrink a file from ten to one hundred times providing for an entire book to be listened to over the Internet or copied onto a single CD-ROM disk. In addition, sound compression provides speed-up and slow-down capabilities.

In the future, the compression ratios and playback features should be enhanced while maintaining high-quality sound. These enhancements will most likely be experienced if non-proprietary sound files are used.

2.2.2 The second component of a DTB is the text file. The same objective, as with the sound file, is to choose a non-proprietary form. The text file must also have structuring capabilities to represent the book. This will allow for proper book presentation along with navigation capabilities. Users of an audio-only playback device can also experience these navigation capabilities.

2.2.2.1 Most people are familiar with text presented on the Internet. The text is structured using a tagging standard known as the hypertext Markup Language (HTML). When a person browses the Internet, their browsers are using the tags found in the HTML file to properly display and navigate the information as intended by the author. These HTML files can be located on the Internet or locally on a computer. HTML has rapidly become a highly used file format. The World Wide Web Consortium has created a recognized HTML standard for browsers and editors to use. There are other files, which have been considered, but HTML version 4.0 has the most support within The DAISY Consortium.

2.2.3 A third component of the DTB is used to coordinate the events of the book playback. This component will identify what is to happen and the order in which it will happen. Recently the World Wide Web Consortium has been creating a new file format known as the synchronized multi-media integration language (SMIL), pronounced smile. This file is similar to an HTML file in the way it is written. Its purpose was originally intended to synchronize a video presentation with its associated audio track.

The synchronization of text and audio was considered as a result of input from the DAISY Consortium. This means as the audio track is being played the text can scroll along. If desired the corresponding word, sentence, or other sections in the text, which are being displayed can be highlighted. The highlighting may provide those with a learning disability increased comprehension or a blind reader the capability to follow along with a refreshable Braille device.

2.3 There may be other components included within The DAISY Consortium standards, but the three listed above are the main ones identified at this point. It is important to note that although international standard components are being considered there will be a limited subset used for book creation. For example, HTML has a wide-range of tags for all kinds of presentation purposes. The DAISY compliant book will have a well-defined subset of these tags and guidelines for how they will be used.

3.0 Types of outputs

3.1 The components and the way they are structured play a big part with the presentation of the DTB. Once the raw materials for the DTB are gathered then there are many possibilities for creating materials for output to the reader. The primary direction of DTB form is dependent on the consumer. Once the components are created then a combination of user needs and technology will shape what the book will look like.

3.2 At the beginning of creating DTB content, the most common form of output may be an analog tape. This will probably be based on the fact that most people are using tapes today. The players are available and people know how to use them. It is expected somewhere in the future that the analog tapes will be used less than any other form.

3.3 When the DTB is discussed, it is most likely to be in terms of presenting text with recorded digital audio using some kind of display program. This would provide the most full effect of the DTB. This content could then be put on a CD-ROM or an Internet server for distribution. The quality of the audio may vary depending on the capacity of the medium. For example, if the audio would be placed on a CD-ROM the quality would probably be higher than if the audio was set up to be transferred over the Internet. There are three typical configurations of the text combined with audio.

3.3.1 The hybrid book refers to a full text with full-recorded digital audio. This would probably be the most desirable form of a DTB, but the most difficult to produce. All of the text from the book could be displayed along with the entire audio spoken by a human.

3.3.2 A variation of the hybrid DTB would be a book with all recorded audio and very little text. The text would still be in HTML but be used to identify a table of contents, pages, and other structural book elements for navigation.

3.3.3 The third type of DTB would be all text and a little audio. The best example of this would be a dictionary with full text and an audio recording for each word pronunciation.

3.4 Other forms could be produced depending on customer's desires and technology. The audio track from the DTB could be copied onto many types of mediums for playback besides an analog tape. An example of this would be CD-ROM for playback on a CD-ROM player found on most typical home sound systems or in cars.

4.0 Tools for playing the DTB

4.1 By using HTML, audio, and SMIL files which comply with international standards, many players developed for the public in general can be used to play a DTB.

4.2 If the DTB is created on an analog tape then the current tape players can still be used. This, as stated above, will be the majority of DTB usage at the beginning 4.3 The combination of text, audio, and the SMIL components copied onto a CD-ROM will allow for many types of players to be used.

4.3.1 A few companies have already started building hand-held devices to play the audio of the DTB. The players will be able to read all components and know what to do with them. With no screen reader or visual display, only audio will be heard. By using non-proprietary standards, more companies are expected to create these types of devices.

4.3.2 A person will be able to read the DTB in their computer with this same CD-ROM played on a hand-held device. The text will be displayed on a video monitor using an Internet browser. A typical Internet browser can display HTML files located on the Internet or locally on a computer's hard-drive. The browsers will need to have the capability to read a SMIL file to play the DTB. This is expected to be generally available once the SMIL specifications are finalized. SMIL compliant browsers are being developed for DAISY consortium members. It will be necessary to have a sound card to play the audio when using a computer.

4.4 The three components of the DTB could be placed on a server to be delivered to anyone who is capable of getting onto the Internet or an Intranet. An Intranet is a private service, which looks like an Internet connection but can only be accessed within an isolated environment. Intranets are usually found in corporations and schools. Depending on copyright laws, the DTB will probably be first available on Intranet systems rather than the Internet.

4.5 Other devices could be used. If the audio is copied to a CD-ROM then a typical CD-ROM player could be used to hear the book read. If a device can store and play back sound on a flash card memory device, then again the recorded audio can be played. This is only limited to the types of storage and playback devices available. Over the next few years, it is expected many more devices will be made available for storing and playing sound information.

5.0 Navigation features and benefits

5.1 Unlike analog recordings and playback devices, the DTB can vary in many ways depending on the production of the DTB and the players.

5.1.1 The structure identified within the HTML created will play a large part on how a book can be navigated. If the DTB creator defines structural elements in the book then the playback device will have the elements to use for moving through the book. That is, by chapter, section, or whatever is defined. If the DTB creator does not identify these elements then they do not exist for the player to use. To be DAISY compliant a content creator must use the tags defined, but this does not mean all book elements must be identified.

5.1.2 The use of the SMIL file can assist in the navigation of a DTB. The SMIL file determines what is displayed or played and in what order. If a person is learning disabled, the SMIL file could be directed to ignore all figure descriptions. The HTML file determines structure. The SMIL file determines what is played or displayed and when.

5.1.3 The quality and functionality of the recorded audio will depend on the original sound quality and compression used. The original sound file can be recorded with many different quality variations. The original sound file for a book will probably be too large to be fit on a CD-ROM or transmitted over the Internet. To reduce the size of the original file, compression is used. These compression methods can determine how large a file is and the quality of the sound. If the original sound recording is poor then the compression will not make it better. The compression method used will determine if the recorded audio will have speed-up or slow-down capabilities. Therefore, no matter which playback device is used, if the file cannot support speed-up then it will not occur.

5.2 The analog tape players have the navigation capabilities commonly known by current users. The sound of the tape is heard during fast forward and reverse. By placing a low tone on the tape, a high-pitched tone is heard during fast tape movement. Placing tones on the tape is done to indicate chapters, pages and other information on a tape to allow a reader to skip unwanted audio recordings. This method of navigation is slow, but is better than not having the tones at all.

5.3 The hand-held devices, which are DAISY compliant, will only play recorded audio. The audio will be navigated by the structure, which was inserted into the HTML file. Although text is not seen, the structure elements will assist in navigating a book.

Unlike analog tapes, direct access into a book can be performed. For example, the table of contents may be the first item presented in a book. A person can move to each line in the table of contents (chapter, section or subsection) with a keystroke. With another keystroke, you can move directly to that chapter, section or sub-section. Pages and figures are examples of other book elements that can also be navigated. The navigation will depend on the structural elements placed into the HTML file and the player's ability to identify those elements.

5.4 The DTB can also be played back using an Internet browser. The browser could be reading the DTB from a CD-ROM inserted into your computer or on the internet/intranet. It should not make much difference in navigation where the DTB is coming from.

The response of the book and the quality of the audio may be affected. By using a computer with a browser there will be more navigation capabilities, although various amounts of text and audio will effect the capabilities to navigate through the book.

5.4.1 Full text and audio will provide the maximum amount of navigation potential. Full text and audio will provide the capability to search the DTB for specific words. This will provide for a more direct access into the book depending on your interest.

A reader could be reading through the DTB with a screen reader and at any time ask to jump into the recorded audio presentation. The recorded audio will have the capability for speed-up or slow-down depending on the compression method used. As the audio recording is played the corresponding text of the paragraph, sentence, or word can be highlighted. The text displayed may assist a person with a learning disability to follow along for increased comprehension or a blind person could follow along with a refreshable Braille device. The audio recording could be stopped at any time placing the reader back in the text. This may be helpful for a blind person to spell a word using a screen reader.

5.4.2 The DTB, which contains full audio and limited text, will still have many navigation capabilities built-in, but the viewing, spelling, and searching of text will be limited. A browser will still be employed as with a full text and audio DTB. The text, which appears on the screen, may be limited to a table of contents, page numbers, and an index. Although full-text is not available, searches and spelling on words, which are available, could be performed. After moving to a text item on the screen such as a chapter title or a page number, the corresponding audio recording could be started with the press of a key. Visually there would not be much to see while the audio recording is being played. Speed-up and slow-down still could be performed. The key benefit for this book would be the navigation capabilities along with the high-quality recorded audio. The navigation will still depend on the structure built into the HTML file and the browser used.

5.4.3 The book that contains all text and little audio will have its own navigation properties. The browser will be used to display this DTB. The navigation capabilities would be similar to the full-text and full-audio book described above. The major difference would be the limited amount of recorded audio. An audio clip could be played as you move to a point, which has a recorded sound file. There would probably not be much in the way of text scrolling while the audio is read. The recorded word or words would be played for pronouncing a word or providing a brief description such as a word in a dictionary. Again, the text could be read using a screen reader.

5.5 The devices used to play only the audio track would have the least navigation of all. Similar to a CD ROM player in a home sound system, navigation would be limited to how the sound files were broken up. When listening to music on a CD ROM player you can skip by time and track. If this information was made available on the CD then the CD ROM player could use it.

6.0 Summary

6.1 As described in this paper, the objective for the DTB is to provide a world-wide book, made of standard components, which can be used with the same navigation capabilities as a sighted person would have with a printed book. The creation of the book will determine how the book will be navigated, the forms available, and the playback devices used. Whichever production methods are used the words and meaning of the book will not change.


Go to previous article. 
Go to next article. 
Return to 1998 Conference Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.