2006 Conference General Sessions

CONTINUOUS SPEECH RECOGNITION FOR CLIENTS WITH NONSTANDARD VOICES OR VISION

 

 

Presenter(s)
Ed Hitchcock
Rehabilitation Institute of Chicago

345 B Superior St. #1543
Chicago IL 60611
Day Phone: 312—238—2997
Email: ehitchcock@ric.org

Continuous speech recognition programs such as Dragon NaturallySpeaking are potentially a very fast and efficient method of accessing the computer for clients. Traditionally, continuous speech recognition has required that clients have a “normal” voice pattern. This has ruled out a number of clients who may have coexisting difficulties with their speech. This presentation will illustrate some ways that clients with non-standard voice can still access speech recognition. Additionally, good to above average perception are generally required to identify errors that are made by the speech recognition program. Clients with coexisting visual deficits may also benefit from the use of companion software. This can significantly improve their ability to correct recognition errors made by the program. This in turn can lead to more successful use of the program for vocational, educational or leisure pursuits on the computer.

This presentation is based on clinical experience. All of my experience with this presentation is with Dragon NaturallySpeaking 8.0-5.0. This presentation will focus on practical methods of using speech recognition with these clients, some demonstration of techniques as well and case studies including audio recordings of actual clients will be presented.

I have successfully trained NaturallySpeaking with the following types of voices:
Ventilator dependent
Mild spastic dysarthria
Mild flaccid dysarthria
Partially paralyzed vocal cords

Some of the general principles for training are as follows: Continuous speech recognition works best with a long phrase or sentence. Clients should be coached to speak with the longest phrase possible (WITHOUT trailing off at the end of the phrase). Phrases should be spoken with context in mind. Dragon is very dependent on context to identify and match word sounds. Hardware should be optimized on the system.

When running your audio setup wizard: Pause a couple of times during the reading of the paragraph. The actual paragraph is not required to be read during audio setup wizard; a client can simply read over multiple sentences from any source. It is only important that they are speaking in their natural tone of voice. The client should be coached to read slowly and clearly. Hardware changes such as a USB microphone should be considered to increase the score on the audio setup wizard. The visual display (during audio quality check) should show more green than yellow; Green is the sound of the voice, yellow is the sound of the background noise of the computer and environment. Audio setup wizard should score higher than 20.
Other options to enhance accuracy for all users: Run acoustic optimizer from the accuracy center. (Archive must be checked under miscellaneous options). The computer should be restarted at lunch (or after an intense correction session) to clear RAM. User should “save user files” every hour to also clear RAM. The following modifications can be used with ventilator dependent clients and other clients with non standard voices: Noise canceling head set microphone pointed up (away from vent noises) if client is unable to run audio setup wizard: Have a similar voice run the audio setup wizard and the first two lines of training.

Modulate volume and pitch appropriate to the client. Make sure microphone is hearing ventilator noises.

Manual correction should be performed for these clients until greater accuracy is achieved.
The client should be coached for appropriate sentence/phrase formulation. Dictation shortcut or command editor can be used for long phrases such as Web addresses or e-mail addresses.
An array or desktop microphone can be used with these clients if they’re not independent with donning a headset microphone. Use of an alternative mouse should be considered for controlling microphone and Audio setup wizard. The international alphabet or letter phrases can facilitate correction by voice. Dragon frequently has difficulty in with distinguishing between the different letters in the alphabet such as “e, b, c t, v and p”; use of the international alphabet can eliminate this issue. Many clients with mild dysarthria can use these suggestions for successful use of Dragon.

Preventing vocal strain is an important consideration for these clients: Drink plenty of decaffeinated beverages, preferably water or tea and honey. (Consider use of a drinking system if a client is not independent with drinking.) Consistent rest breaks are critical, clients should not use speech Recognition if they have a sore throat. Clients on a ventilator should not try to talk through a breath cycle even if they are able; this is a particular strain on the voice.

 

Options for learning-disabled clients or clients with low vision: The training documents can be printed out in a larger font. Care should be taken that the client is reading in their natural speaking voice, it is important for them not to be focusing so much on the reading that their voice becomes unnatural. Reading the passage to them without having them look at the monitor may help them to speak more naturally. Training with a smaller vocabulary will enhance recognition of remaining words; leading to overall increased accuracy. Visual enhancements such as the high contrast display, magnifier or zoom text magnifier may make the training process easier. Auditory feedback is available in the preferred version of Dragon; NaturallySpeaking is available to play back a recording of a visual voice or a computer synthesized voice reading of the actual text. This can help to determine accuracy and correction method to be used in the event of errors. Clients who display poor visual attention may benefit from hiding the results box that appears with Dragon.

JawBones, J-say and Keystone screen speaker all allow for extensive auditory feedback both during training and during use. I am not directly familiar with J-say; my understanding is that it is very similar to Jawbones. Both Jawbones and Keystone allow for auditory read back of the correction list. This will allow a client with poor reading to determine the correct choice. Jawbones and J-say or geared towards low vision or blindness whereas Keystone is geared towards learnin9 disability. Keystone includes spelling suggestions in the spell that box if a client with a learning disability is unsure of correct spelling of a 9iven word.
if client is unable to perform corrections (due to learning disability), a trained caregiver can listen to their voice recordings using playback feature to manually enter corrections to the voice file. Dragon NaturallySpeaking professional allows you to save a copy of the voice recordings for editing at a later time.

Manual correction techniques: Select text with correction hotkey or mouse. Correction hotkey is the key on a desktop keyboard. This must be changed for a laptop. Tools> options> Hotkeys tab Correction hotkey will bring up quick correct list and spell that option. Use mouse and keyboard to manually enter corrections. Manual correction should be repeated for these clients until 9reater accuracy is achieved. Responsibility for correction should then be transitioned over to the client in a graded process.

 

Use of continuous speech Recognition with these clients is challenging but is possible. Ultimately, it can be an excellent method of computer access!


Go to previous article
Go to next article
Return to 2006 Table of Contents


Reprinted with author(s) permission. Author(s) retain copyright