1999 Conference Proceedings

Go to previous article 
Go to next article 
Return to 1999 Conference Table of Contents

Digital Talking Book Production Process

Daisy Logo

George Kerscher
Project Manager
PM to the DAISY Consortium
Recording For the Blind & Dyslexic
Email: kerscher@montana.com 
Phone: 406/549-4687


Organizations serving persons who are blind or print disabled start by the selection of a title for production. This can be a complex task for an organization, but I want to start with the assumption that a title has been selected and it is then put into the production process.

There are four products. The figure below shows with colored arrows the different paths the products take in the production process. The one product is described as an audio book with navigation features; another product is a book that is electronic text only; another is a braille version that was produced from the electronic text; finally there is the book that is full text and full audio completely synchronized.

Flow Chart described in text.

Flow Chart described below:


The figure shows the title entering the first box called "Product Description" from this box there are two arrows the blue arrow pointing down and to the left points to a box titled "Full Text" and the red arrow on the right points to "Navigation Only*." Next to the box marked "Navigation only is a note that reads, "NOTE: Following this path means that braille, E-Text, and full text with Audio are not possible. From the box marked "Full Text there are two blue arrows. The one on the left points to "Publisher Files" and the arrow on the right points to "Scanning." From the box marked "publisher files is a box marked "Obtain" from that box are two blue arrows. One is pointing to the right and up to "Scanning" the other arrow points down to "Convert." From Convert is a blue arrow pointing to "Tag and proof" From "Scanning" is a blue arrow pointing to "Tag and Proof" as well. From the "Navigation Only" is a red arrow pointing to the corner of "tag and Proof." The "Tag and Proof" is centered and where all the production processes converge.

From "Tag and Proof" there are two arrows, one blue and one red. The blue arrow on the left points down to "E-Text Quality Control" and the red arrow on the right points to the corner of the same box. There are four arrows pointing out of "E-Text Quality Control." The green arrow on the right points to "Braille Translation" the magenta arrow in the center points down to "Archive" and the double arrow (blue and red) on the left points to "Assign for Recording." From the "Braille Translation" box is a green arrow pointing down to "Braille Proof" and from there is a green arrow pointing to "Archive."

From "Assign for Recording" the blue/red double arrows point down to "Record with SMIL" and from that box is the double arrow pointing down to "Collect Recordings." From that box is the double arrow pointing down to "Synchronization Quality Control." From the "Synchronization Quality Control" the double arrows point to "Archive." From "Archive" there are four arrows (one for each production path) pointing down to "Distribute."

From "distribute" are five arrows. The first is the blue arrow that points to "Full Text and Audio" a blue and magenta arrows point to "E-Text" and a green arrow points to "Braille" and the red arrow points to "Navigation and Audio only."


Full text or navigation only

In the figure you can tell that if the book is to have navigation and audio only, that the full text, the E-Text, and the braille are not possible. This is obvious since the Navigation only provides the high level structures for navigation, but does not provide full text. The text is necessary for braille or for full text searching.

It is also important to point out that the production process for the Navigation and audio only book is much simplified. The arrow shows that this type of book just touches the tagging and proof and the E-Text quality control phase. The only items tagged here are the highest level headings and perhaps page numbers that provide the navigation.

Source files or scanning

If the product is determined to have full text, the full text must be produced. This is done through scanning and hand tagging or through a conversion process with publisher files.

The whole process of using publisher files is quite complicated and will not be discussed here, but some items are worth mentioning:

Flowchart explained in text.

In the figure at the box marked "Obtain" there is an arrow pointing up to "Scanning." If it is determined that the publisher does not have files, then they will need to be scanned. Also, even with good files from publishers, some items may need to be scanned. For example, the index and table of contents may be automatically generated by software and many times these files are not part of the publisher's source files. These are items that may need to be scanned or keyed in by hand.

Following the Structure Guidelines

Many decisions need to be made in the production process. It is essential that the "Structure Guidelines" (Currently under development within the DAISY Consortium) be followed. This set of guidelines explains what to do with certain types of items. Using these guidelines ensures that the books will be consistent in how the information can be presented to the reader of Daisy Talking Books. The guidelines help producers identify certain classes of information and apply the appropriate mark up defined in the DAISY-NISO 3.0 DTD.

Using the DAISY-NISO XML Document Type Definition (DTD)

The DAISY-NISO 3.0 Document Type Definition (DTD) ensures consistency in talking books and in braille. The DTD is designed as a "conversion" DTD. This allows many types of books with many types of structures to be marked up with a single DTD. Materials that come from publishers with carefully prepared files can be converted directly into this DTD. Publisher materials that come with poorly marked files or text that must be created from scanning can all use a single DTD for conversion purposes. When you combine the DTD with the Structure Guidelines you have what you need to create books that will pass the E-Text quality control phase of the production process.

It is essential to point out that the conversion software, the training program for production staff, the Structure Guidelines and the DAISY-NISO 3.0 DTD are all required to produce the tagged files that are used from this point on. Some people have referred to this as a single source file used in the production process.

Importing the XML into LPStudio/PRO

To add digital audio recordings that are synchronized with the structure and full text, or with the navigation components only can be done in LPStudio/PRO. This software provides for the importing of a book or a portion of a book for recording. This is the step where the narrator reads the marked up text that is imported. This software automatically generates the SMIL information that provides the synchronization between the DAISY-NISO structures and the audio recording.

It is possible to break a book up into logical divisions and send them to different places for recording. If this is done, the parts are collected and combined to make up the whole book. This is the reason for the box marked "Collect Recordings."

Quality Control using the Navigation Control Center

The end user, the reader of DAISY Talking Books, uses the Navigation Control Center of each book. No matter what access device they use, the "NCC" provides the high level structural navigation. Using LPStudio/PRO and any other playback software or hardware it is possible to test the correctness of the structures. For example, if you go to page 90 and hear "age 90" you would know that the synchronization point is off by a fraction of a second. It is a simple matter to then slide the synchronization forward a little so you hear "page 90" correctly. This is one of the items that take place in this phase of quality control. It is also essential to check to make sure the project is not missing any parts. A comparison of the printed book and the major structures is conducted.

Braille Production

To produce braille, the marked up book in the DAISY-NISO 3.0 DTD gets passed to braille production staff that uses a braille translator. The DAISY Consortium is working with several of these companies to ensure that the job of producing braille is automated as much as possible. While files have been imported into braille translation software in the past, none have used this DTD and the Structure Guidelines to ensure high quality. The speed and accuracy of this translation process should be better than the braille production community has ever seen. There will always be those extremely difficult braille-formatting issues that a trained braillist needs to address, but the mundane activities and even some of the tricky items should be handled by software. We can make these statements, because of the high quality of the data moving into the braille translation process.

The All Important archive

All forms of a book are placed into the archive for safe keeping and as input for the distribution process. The archive contains the data that passed one or more quality control checkpoints. The audio files are stored here in the richest, purest form (44.1khz or 22.05khz) if possible. It is also likely that a book will be taken from the archive and passed through another phase to create another format. For example, a book that contains full text may move into a braille production phase at a later time. Also, a book that has the E-Text component archived may have already been sent out to a needy student and at a later time the book could enter the recording process for adding the human recordings. The archive is the heart and soul of a library's collection.


The distribution begins by taking the data from the archive. The distribution mechanisms are extremely flexible with digital data. It is important to point out that the audio files are very large. A 44.1khz recording requires 360 meg per hour. A 30 hour book would take about 10 gigabytes of storage in its rich uncompressed archive state.

One distribution mechanism planned will be on standard 682 megabyte CD-ROM. To make a large book fit on one CD, a coding system like "MPEG" may be used. The encoder processes uncompressed sound files to remove unnecessary sounds and encode the data to leave the smallest possible footprint. Only the audio coding in the files change in this process. The structure and content and all the synchronization points remain unchanged. The archive is kept as pure as possible and when a new improved audio encoding system is developed the archive can be tapped and the better sound quality encoder applied.

NOTE: Much effort goes into making decisions on what types of encoders are used in the distribution process. These decisions are made in conjunction with the player manufacturers to ensure that the players can take advantage of the encoding mechanism. For example, speed up and slow down must be possible with the encoded file. It is not just a matter of being able to play the files. The quality and compression ratios are all considerations. With MPEG 2 layer 3 more than 50 hours of acceptable quality sound can be placed on one CD-ROM.

The E-Text type of book and the braille files ready for embossing are tiny in comparison to the audio files. You can fit 500 or more of these books on one CD-ROM. FTP of these types of files through the Internet are possible. Also, a full text and audio type of book can be streamed over the Internet with normal modem connections of today. It is important to mention that the security of this information is a major consideration of the DAISY Consortium and the implementation of any of these mechanisms depends on our protecting the intellectual property of the copyright holder. This is another topic outside of this paper.


The production process described by this flow chart is not simple. Software development has been sponsored by the DAISY Consortium to make this job simpler. Structure Guidelines are under development and the XML DTD we will be using is under testing and evaluation. Conversion and editing tools are in the process of being examined. Finally training and technical support programs are beginning to be put in place. The members of the DAISY Consortium understand the whole and the parts. We are working together to standardize the process and create the tools to make it as easy as possible to create the high quality information that persons who are blind or print disabled have the human right to expect.

Go to previous article 
Go to next article 
Return to 1999 Conference Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.