CONTINUOUS SPEECH
RECOGNITION FOR CLIENTS WITH NONSTANDARD VOICES OR VISION
Presenter(s)
Ed Hitchcock
Rehabilitation Institute of
Chicago IL 60611
Day Phone: 312—238—2997
Email: ehitchcock@ric.org
Continuous
speech recognition programs such as Dragon NaturallySpeaking are potentially a
very fast and efficient method of accessing the computer for clients. Traditionally,
continuous speech recognition has required that clients have a “normal” voice
pattern. This has ruled out a number of clients who may have coexisting
difficulties with their speech. This presentation will illustrate some ways
that clients with non-standard voice can still access speech recognition. Additionally,
good to above average perception are generally required to identify errors that
are made by the speech recognition program. Clients with coexisting visual
deficits may also benefit from the use of companion software. This can
significantly improve their ability to correct recognition errors made by the
program. This in turn can lead to more successful use of the program for vocational,
educational or leisure pursuits on the computer.
This
presentation is based on clinical experience. All of my experience with this
presentation is with Dragon NaturallySpeaking 8.0-5.0. This presentation will
focus on practical methods of using speech recognition with these clients, some
demonstration of techniques as well and case studies including audio recordings
of actual clients will be presented.
I
have successfully trained NaturallySpeaking with the following types of voices:
Ventilator dependent
Mild spastic dysarthria
Mild flaccid dysarthria
Partially paralyzed vocal cords
Some
of the general principles for training are as follows: Continuous speech
recognition works best with a long phrase or sentence. Clients should be
coached to speak with the longest phrase possible (WITHOUT trailing off at the
end of the phrase). Phrases should be spoken with context in mind. Dragon is
very dependent on context to identify and match word sounds. Hardware should be
optimized on the system.
When
running your audio setup wizard: Pause a couple of times during the reading of
the paragraph. The actual paragraph is not required to be read during audio
setup wizard; a client can simply read over multiple sentences from any source.
It is only important that they are speaking in their natural tone of voice. The
client should be coached to read slowly and clearly. Hardware changes such as a
USB microphone should be considered to increase the score on the audio setup
wizard. The visual display (during audio quality check) should show more green
than yellow; Green is the sound of the voice, yellow is the sound of the
background noise of the computer and environment. Audio setup wizard should
score higher than 20.
Other options to enhance accuracy for all users: Run acoustic optimizer from
the accuracy center. (Archive must be checked under miscellaneous options). The
computer should be restarted at lunch (or after an intense correction session) to
clear RAM. User should “save user files” every hour to also clear RAM. The
following modifications can be used with ventilator dependent clients and other
clients with non standard voices: Noise canceling head set microphone pointed
up (away from vent noises) if client is unable to run audio setup wizard: Have
a similar voice run the audio setup wizard and the first two lines of training.
Modulate
volume and pitch appropriate to the client. Make sure microphone is hearing
ventilator noises.
Manual
correction should be performed for these clients until greater accuracy is
achieved.
The client should be coached for appropriate sentence/phrase formulation. Dictation
shortcut or command editor can be used for long phrases such as Web addresses
or e-mail addresses.
An array or desktop microphone can be used with these clients if they’re not
independent with donning a headset microphone. Use of an alternative mouse
should be considered for controlling microphone and Audio setup wizard. The
international alphabet or letter phrases can facilitate correction by voice.
Dragon frequently has difficulty in with distinguishing between the different
letters in the alphabet such as “e, b, c t, v and p”; use of the international
alphabet can eliminate this issue. Many clients with mild dysarthria can use
these suggestions for successful use of Dragon.
Preventing
vocal strain is an important consideration for these clients: Drink plenty of
decaffeinated beverages, preferably water or tea and honey. (Consider use of a
drinking system if a client is not independent with drinking.) Consistent rest
breaks are critical, clients should not use speech Recognition if they have a
sore throat. Clients on a ventilator should not try to talk through a breath
cycle even if they are able; this is a particular strain on the voice.
Options
for learning-disabled clients or clients with low vision: The training
documents can be printed out in a larger font. Care should be taken that the
client is reading in their natural speaking voice, it is important for them not
to be focusing so much on the reading that their voice becomes unnatural.
Reading the passage to them without having them look at the monitor may help
them to speak more naturally. Training with a smaller vocabulary will enhance
recognition of remaining words; leading to overall increased accuracy. Visual
enhancements such as the high contrast display, magnifier or zoom text
magnifier may make the training process easier. Auditory feedback is available
in the preferred version of Dragon; NaturallySpeaking is available to play back
a recording of a visual voice or a computer synthesized voice reading of the
actual text. This can help to determine accuracy and correction method to be
used in the event of errors. Clients who display poor visual attention may
benefit from hiding the results box that appears with Dragon.
JawBones,
J-say and Keystone screen speaker all allow for extensive auditory feedback
both during training and during use. I am not directly familiar with J-say; my
understanding is that it is very similar to Jawbones. Both Jawbones and
Keystone allow for auditory read back of the correction list. This will allow a
client with poor reading to determine the correct choice. Jawbones and J-say or
geared towards low vision or blindness whereas Keystone is geared towards
learnin9 disability. Keystone includes spelling suggestions in the spell that
box if a client with a learning disability is unsure of correct spelling of a
9iven word.
if client is unable to perform corrections (due to learning disability), a
trained caregiver can listen to their voice recordings using playback feature
to manually enter corrections to the voice file. Dragon NaturallySpeaking
professional allows you to save a copy of the voice recordings for editing at a
later time.
Manual
correction techniques: Select text with correction hotkey or mouse. Correction
hotkey is the key on a desktop keyboard. This must be changed for a laptop.
Tools> options> Hotkeys tab Correction hotkey will bring up quick correct
list and spell that option. Use mouse and keyboard to manually enter
corrections. Manual correction should be repeated for these clients until
9reater accuracy is achieved. Responsibility for correction should then be
transitioned over to the client in a graded process.
Use of continuous speech Recognition with these clients is challenging but is possible. Ultimately, it can be an excellent method of computer access!
Go to previous article
Go to next article
Return to 2006 Table of Contents