2004 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2004 Table of Contents 


WORD FOR WORD: EXPLOITING SYNERGIES BETWEEN SPEECH RECOGNITION AND WORD PREDICTION SOFTWARE

Presenter(s)
Bob Follansbee, EdD
55 Chapel St.
Education Development Center
Newton, MA 02458
Phone: 617-969-7100 x2716
Fax: 617-969-3440
Email: bfollansbee@edc.org

Fraser Shein, Ph.D., P.Eng.
Tom Nantais, P.Eng.
Rose Nishiyama, B.Sc.
Shae Birch, B.Sc.
Bloorview MacMillan Children's Centre
350 Rumsey Rd.
Toronto, ON M4G 1R8
Phone: 416-424-3855 x 3538
Fax: 416-425-1634
Email: fshein@bloorviewmacmillan.on.ca

The proposed presentation will explain and demonstrate a unique concept in speech recognition implementation, Word for Word. This concept combines speech recognition with other assistive technologies to address problems that are commonly encountered in implementing speech recognition with individuals with disabilities. An implementation of this concept will be a demonstrated through use of a prototype product which is being developed under the auspices of a US Dept. of Ed./National Institute of Disability and Rehabilitation Research (NIDRR) grant to Education Development Center, Newton, MA, with additional design expertise and support from the Bloorview MacMillan Children's Centre, Toronto, Canada.

The Problem
Speech recognition is a tool that enhances the writing abilities of many individuals with disabilities (e.g., Raskind & Higgins, 1998), but some potential candidates for speech recognition do not wind up using it successfully for a variety of reasons, two of which will be addressed in this paper.

One accessibility problem for some individuals is the operation of current continuous speech recognition programs. Older discrete speech recognition technology required the user to dictate one word at a time, while current products are designed to function most effectively with multiple word utterances. However, some users still prefer the more slowly paced operation of discrete speech recognition where the user says a single word and then immediately confirms the accuracy of the software's guess. This pace better matches some users' styles of composition or their speaking and articulatory abilities. In addition, the initial training (enrollment) to create a voice file often fails to respond when a person cannot effectively speak in multiple word utterances.

Another significant accessibility problem with existing speech recognition software is the reading demand. First, the initial process of training the software to recognize one's voice requires a considerable amount of reading. In addition, speech recognition is never 100% accurate, so there is always a need to review recognized text and make changes. Further, the actual process of correcting recognition errors often relies upon a user interface that presents a set of alternative choices. Users must be able to read these alternates and decide which of them, if any, matches their original utterance. All of these tasks can be difficult or impossible for users who have difficulty reading, and they interfere with the main task at hand - producing written work.

Specific Design Considerations
When developing the concept of Word for Word, several existing assistive technologies seemed to have promise in addressing these two issues, particularly word prediction and text-to-speech (TTS). Word prediction itself is another technology designed in part to enhance the writing abilities of individuals with disabilities (Laine & Follansbee, 1996). Word prediction has not been an explicit element of speech recognition products to date, but its utility in Word for Word was suggested by the operation of an early discrete speech recognition product, DragonDictate. First, both word prediction and discrete speech recognition operate at a slower, word-by-word pace as preferred by some users. However, the operation of DragonDictate suggested a closer connection between the two technologies.

When using DragonDictate, the speaker can:

This list of alternative guesses is provided based on the acoustic signal of the speaker's last word, and the target word can be selected from this list. Then, in those cases when the correct word does not appear in the initial choice list, DragonDictate offers a very elegant means of correcting words. Working from the initial list of words based on the acoustic sound of the target word, DragonDictate can accept keystroke input (i.e., allow the user to begin to type the word) to dynamically refine its list of alternatives. The list of alternatives changes after the first keyboard input to reflect both the acoustic sound of the word and the first letter of the word. This strategy is remarkably accurate at calling up the correct word after just one keystroke in most circumstances. In fact, it operates very much like word prediction once the user starts typing letters. The idea of integrating speech input with keyboard-based word prediction in a similar manner seemed a promising avenue to achieving greater accuracy and flexibility in Word for Word.

When contemplating the use of TTS to address the reading demands in Word for Word we considered uses of TTS in similar products. Over the years, some developers have added the functionality of screen readers with TTS to speech recognition to support some of the reading demands. These solutions have often been successful for some, but the combination of two pieces of software was often complex and balky in operation, and usually considerably more expensive. Word prediction software has also been augmented with TTS capabilities to provide support to writers with learning disabilities (Lewis, et al., 1998). Similar to the DragonDictate interface, word prediction software presents choices in response to user input, from the keyboard in this case (see Figure 1). However, word prediction systems designed for this application have the useful addition of TTS in the choice list - the computer can read the choices aloud in a synthesized voice to help the user choose among them.

Figure 1 shows a word processing screen with the partially typed sentence, Acrylic paint is easy to u..., Under the letter u is a smaller window with five words, all beginning with the letter u. The first word in the list is use and it is presumably the correct choice for completing the sentence.

Figure 1.

Our main goal in exploring the Word for Word concept is to design a speech recognition product with word prediction features and TTS capabilities that address the challenges posed for some in current speech recognition products. Namely, users can work at a slower pace if they wish and those with reading difficulties can complete the review and correction processes more independently and efficiently.

In exploring the optimal way to realize this product, it seemed promising to consider integrating the desired speech recognition functionality into an existing product that incorporated by word prediction and TTS. The program WordQ (Shein et al., 2001) was selected for that purpose.

WordQ is a word prediction product that incorporates TTS features. It can read possible choices aloud as the user cycles through them with hotkeys or points with a mouse. It can also optionally echo characters, words and sentences as they are typed. Finally, a read-back mode permits users to hear sections of their document read back to them for review and correction purposes. WordQ's word prediction algorithm has been shown in simulation to present predictions accurate enough that approximately 40-50% of keystrokes can be saved by selecting predicted words (Nantais et al., 2001).

Although WordQ is a suitable match for realizing the Word for Word concept, the challenge in using it or any similar product is that WordQ was not designed for spoken input. As one integrates a new form of technology into an existing product, many design and interface issues arise along the way. As we demonstrate the Word for Word product in the proposed presentation, we will describe many of these decisions and their implications in the operation of the product.

In moving forward with the Word for Word concept and a resulting product, we recognize that many potential beneficiaries of speech recognition are still not being well served. We hope to continue the development of a speech recognition product that enables an ever-wider range of individuals with disabilities to use speech recognition successfully.

References:

Laine, C.J., & Follansbee, R. (1996). Word prediction technology and the writing of low-functioning, deaf students. In M.C. Sitko & C.J. Sitko (Eds.), Exceptional Solutions. London, Ontario, Canada: The Althouse Press.

Lewis, R.B., Graves, A.W., Ashton, T.M. & Kieley, C.L. (1998). Word processing tools for students with learning disabilities: A comparison of strategies to increase text entry speed. Learning Disabilities Research & Practice, 13(2), 95-108.

Nantais, T., Shein, F. & Johansson, M. (2001). Efficacy of the word prediction algorithm in WordQ. Proceedings of the RESNA 2001 Annual Conference, 77-79, Washington, D.C.: RESNA Press.

Raskind, M. H., & Higgins, E. L. (1998). Assistive technology for postsecondary students with learning disabilities: An overview. Journal of Learning Disabilities, 31, 27-40.

Shein, F., Nantais, T., Nishiyama, R., Tam, C. & Marshall, P. (2001). Word cueing for persons with writing difficulties: WordQ. The Sixteenth Annual International Conference on Technology and Persons with Disabilities, California State University at Northridge, Los Angeles, CA, March.


Go to previous article 
Go to next article 
Return to 2004 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.