2000 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2000 Table of Contents

Choosing and Using Speech Recognition Technology

Terry Thompson
Computer Learning Center Coordinator
Independence, Inc.
Lawrence, KS
Phone: 785-841-0333
Email: tt@peakware.com 
Website: http://www.sunflower.com/~indepinc

Kevin L. Price
Disabilities and Computing Program, UCLA
Los Angeles, CA
Phone: 310-206-7133
Email: PriceK@ucla.edu 
Website: http://www.dcp.ucla.edu

The purpose of this paper is to provide a comparative overview of current speech recognition products. These products can benefit everyone, including persons with disabilities, but they have differences in their accuracy, user interfaces and usability. For anyone interested in speech recognition, it is important to understand these differences, and how they may impact one’s success at using the products. At the CSUN conference, we will propose criteria for evaluating these speech recognition products, including the following:

How much training does it take to be productive at using the speech recognition program? Is the user interface easy to understand and logically arranged for the new user of the product? How “hands-free” is the system? How easily can a user interact with the computer system using only voice? Is the training and correction of individual speech files easy and intuitive for a wide range of users? Does the program support a wide range of speech patterns and accents? Is the program customizable to allow adjustment for different speaking styles of individuals? Does the program integrate transparently and without slowing recognition within a wide variety of applications? Can the speed of a person’s speech vary without reducing the program’s recognition accuracy? Does the program include speech output or integrate with other speech output programs to help people with learning disabilities or visual impairments? Is the program well supported by the company? Is the program affordable? Is the program still being developed for increased usability? Speech recognition is a technology that is constantly evolving. It is a technology that is experiencing tremendous growth in the commercial market, apart from its original niche as an assistive technology product. There are presently three major companies with speech recognition products, Dragon Systems, Lernout & Hauspie (L&H), and IBM. Stiff competition between these companies and more demand from consumer and business markets, has led to a tremendous drop in prices over the last few years. Competition has also fueled the development of a plethora of new products. Each company has several products available, ranging in price, features, and the applications that they support. This paper seeks to make sense of the overwhelming array of products so that persons who are shopping for speech recognition will have a better understanding of their choices.

At the time of this paper’s submission (October 1999), new products were on the horizon for each of the companies profiled here. Our presentation at CSUN will include any new products that have been released in Fourth Quarter 1999 or First Quarter 2000.

What are the Types of Speech Recognition? Discrete Slower dictation process - better for persons with difficulty in language processing or in fluid speech Word-by-word style, rather than phrases, reflects the way beginning writers form sentences Continuous Processes speech by phrase Takes context into account Is less accurate if phrases are interrupted Advantages: Speed and accuracy (for most users) Who Can Benefit from Speech Recognition? Persons with mobility impairments or injuries that prevent keyboard access Persons who have or who are seeking to prevent repetitive stress injuries Persons with writing difficulties Any person who want hands-free access to the computer Any persons who wants to increase their typing speed (reportedly up to 160 wpm) What is Required to Use Speech Recognition? A Powerful Computer Consistent Speech (not necessarily intelligible) Fluid speech (i.e., not pausing between words) desirable for use of continuous speech products Patience Basic knowledge of computers Fairly high cognitive ability What Do You Mean by Fairly High Cognitive Ability? Ability to voice appropriate capitalization and punctuation Ability to assess the accuracy of the dictation (text-to-speech is available in some products, but no highlighting of words)

Ability to correct incorrect dictation, which usually requires the ability to spell at least the first few letters of a word, and the ability to recognize a correct spelling among similarly spelled words

Ability to memorize commands and procedures

Current Speech Recognition Products (October 1999) As mentioned above, the major players in the speech recognition market are Dragon Systems, Lernout & Hauspie (L&H), and IBM. Each company offers several products, ranging in price and features. Because of the variety of products available, shopping for a speech recognition system can be an overwhelming experience. This presentation will present the differences between the products that are presently available. The following is a brief summary of the products available as of October 1999:

Company: Dragon Systems
Web: www.dragonsys.com
Phone: 1-800-TALKTYP (1-800-825-5897) Dragon’s original product, Dragon Dictate, is currently the only product that uses the discrete speech model. Discrete speech, as mentioned above, is the best solution for persons with difficulty in language processing or in fluid speech, or who form sentences one word at a time, rather than in phrases. The latest version, 3.0 Classic, offers fully functional voice control across all applications. It is the only current speech recognition product that supports Windows 3.x. Because it uses discrete speech, it is better than current continuous speech products at recognizing the speech patterns of persons who naturally pause between words, and seems to be better at learning to recognize persons with unique speech patterns. Unfortunately, Dragon Systems has discontinued development on this product, as the company’s focus is now on continuous speech products, which are more viable in the larger commercial market.

Dragon’s current continuous speech product line, known as Dragon NaturallySpeaking, includes a Standard, Preferred, and Professional edition, listed in order from low end to high end. The Preferred edition includes dictation playback and text-to-speech, features that distinguish it from the Standard edition. The Preferred edition also supports input from an external recording device, although no recording device is provided. A special version of the Preferred edition, Dragon NaturallySpeaking Mobile, does include a digital recording device for additional cost. On the high end of Dragon’s NaturallySpeaking product line, the Professional edition is distinguished by its expanded macro and scripting capabilities, which allow users to dictate long sections of text or complex computer operations with simple commands. The Professional edition also comes in Legal and Medical versions, which feature custom vocabularies for these disciplines.

Dragon has also developed a teen version, which includes special teen voice models and an easier-to-use interface, including easier documentation and on-line help.

As of October 1999, a Macintosh version of Dragon Naturally Speaking was scheduled for release near the end of 1999.

Company: Lernout & Hauspie

(L&H) Web: www.lhs.com/voicexpress/
Phone: 800-380-1234

L & H products are based on speech recognition technology developed by Kurzweil, a major pioneer in speech recognition. The current L&H product line, called VoiceXpress, includes a Standard, Advanced, and Professional edition. The differences in these editions are fairly straightforward. In the Standard edition, VoiceXpress’s natural language command interface works only in L&H’s own word processing application, called XpressPad. The Advanced edition extends natural language support to include Microsoft Word. The Professional edition further extends natural language support to encompass the entire Microsoft Office suite, plus Internet Explorer. The Professional edition also provides support for recorded dictation, and includes a bundled digital recorder.

Company: IBM

Web:www.software.ibm.com/speech/ Phone: 1-800-825-5263 (IBM Speech Systems)

IBM has been a major player in speech recognition for many years. Its discrete speech product, IBM VoiceType, was a major competitor of Dragon Dictate. However, IBM has discontinued this product and is now focusing all its efforts on developing continuous speech products. Its current product line, IBM ViaVoice Millenium, includes a Standard, Web and Professional edition. The web edition features natural language commands for Internet Explorer, Netscape Communicator and America Online. The web edition also features a specialized vocabulary for on-line chats. The Professional edition provides most of the features of the Web edition, but also provides natural language commands for the entire Microsoft Office suite, and specialized business, finance, and computer vocabularies.

Although speech recognition got its start as an assistive technology product, the commercial market has fueled its rapid development in recent years, and the primary target market of each of the companies described above is now the general public, rather than persons with disabilities. In this presentation, we will return our attention to speech recognition as a tool for persons with disabilities. A person who has a disability or who works with persons with disabilities will come out of this presentation with a more accurate representation on which speech recognition products will best work with them. There is a lot of confusion today about speech recognition products. The main focus of this presentation will be to clarify many issues and to ultimately guide people with disabilities to the best programs for them.

Go to previous article 
Go to next article 
Return to 2000 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.