Go to previous article
Go to next article
Return to 1999 Conference Table of Contents
Renee L. Griffith, CTO
7201 Archibald Ave. #441
Rancho Cucamonga, CA 91701
With the large variety of speech recognition products now available, it can be confusing when trying to decide which package to choose. There are at least 20 different titles available on the retail shelves today, including continuous speech and discrete speech products. The latest releases of continuous speech software have just started to offer command and control of not only Word 97, but many other Windows applications. We now have the ability to dictate continuously into virtually any Windows application as well as some command and control of those applications. All this, using natural language commands.
The high-end packages of continuous speech recognition software can now perform many functions, including continuous dictation allowing the user to input speech directly into various applications, such as Word or WordPerfect, Excel, etc. in a continuous and natural manner. Some speech recognition software allows the user to command and control the Windows desktop and environment including moving between applications and documents, dropping menus, controlling applications, and creating shortcuts. Navigation is moving through your document to allow editing or formatting. Natural language commands allow you to say "go to the fifth paragraph" instead of driving the cursor to the fifth paragraph using "arrow" navigation commands.
Editing is one of the most productive uses of speech recognition software. Editing commands are used to select text, paragraphs, or pages and cut, copy or paste within the same document, different documents, or even different applications. Formatting is greatly enhanced with speech recognition. The ability to simply say "paragraph-justified" or "underline-line" is considerably faster than using a mouse to select text and then execute the keystrokes to perform the formatting. Again, with natural language commands, editing and formatting are frequently combined with continuous natural commands.
In general, recognition rates are around 95%. The speed of dictation varies according to the type of hardware used, the experience level of the individual and whether you are using a discrete or continuous speech product. In general, discrete speech dictation can reach 60 words per minute with 97% accuracy. Continuous speech users are enjoying 140 words per minute or more, with 95% accuracy.
An active word vocabulary is one that resides in RAM (random access memory). The total vocabulary refers to the entire available dictionary on your hard disk. These can run up to 230,000 words, with add-on vocabularies and foreign languages available to help accommodate specialized industries and international uses. Current continuous dictation packages also include special utilities and tools to help build custom vocabularies for industry specific needs. The continuous software "reviews" various documents that you instruct it to process, and produces a word list which you then edit to remove unwanted words or terminology. You will then train any words for which the speech software doesn’t have a language model. Your accuracy should be extremely high when you finish enrollment and vocabulary building.
For optimum performance of today’s speech recognition packages, a Pentium II/350 should be considered the minimum. Additionally, a minimum of 64Mb RAM is recommended however, 128Mb RAM is preferable because today’s applications are extremely RAM intensive. All the software packages require a SoundBlaster 16 (or compatible) or higher sound card. Each speech recognition package operates with different operating systems including Windows 3.x (discrete only), Windows 95 and 98 or Windows NT 4.0. All the continuous packages can operate in NT 4.0 and are 32-bit applications however, there is extremely limited command and control available in NT 4.0 using any of the speech recognition packages. Macros can be developed to allow hands free use of NT 4.0, but it takes time to develop them.
Certainly hands-free operation of a PC is one of the major benefits of speech recognition software. This type of software can potentially reduce chances of Repetitive Stress Injury, (RSI) , or give relief to someone already suffering from RSI. For those who do not suffer from RSI, an increase in productivity is possible due to the formatting, editing and navigation controls available. For those who don’t know how to type or type slowly, the continuous speech assists in more productive and accurate typing. Finally, speech recognition allows access to jobs previously denied the disabled who can not use their hands to operate a computer.
Discrete speech requires an individual to put a pause in between each word spoken and this can cause frustration when learning. Most of us talk at a rate in excess of 150 words per minute and therefore feel slowed down by discrete speech. With the introduction of continuous speech products, there seems to be less frustration and strain on the vocal cords. None of the current software is 100% accurate. If an individual has a speaking disability, often the discrete packages are necessary for recognition. However, the continuous speech software is improving in this area.
Voice strain is a potential problem when using speech recognition, especially when used by an individual with RSI or hand disabilities. They will be using their voice much more than an individual who can use mouse and keyboard combined with speech recognition. Attention should be paid to the voice of the individual to identify any possible problems with voice strain and if necessary an appointment made with a technical voice instructor to learn proper breathing and speaking techniques. Continuous speech recognition software may help relieve this problem.
Remember it takes tremendous motivation and patience to be successful with speech recognition software. Many injured individuals take between 30 and 50 hours of working with the software to establish a good voice profile and create the necessary macros to reach productivity. It also takes the software time to formulate how an individual speaks. This is the most frustrating part. Constant attention and stringent correction are required to get excellent recognition accuracy and speed. If the individual doesn’t invest this preliminary time, he/she may be frustrated by a low accuracy rate. Even in continuous speech packages with people who are not injured, the initial investment of time will be somewhere around 12 hours to establish a comprehensive vocabulary and high accuracy.
For those who have little or no computer experience, it is absolutely imperative that they learn how to operate a computer using speech recognition from the beginning. This would apply if an individual had performed a job function not involving a computer but is being retrained to a computer position. This individual will require classes in speech recognition, Windows 95/98 and any other software package required to perform a specific job.
Our view is that even the experienced computer user, familiar and comfortable operating a PC in the Windows environment, should take a minimum of nine hours of speech recognition training. The first six hours consist of an overview of the speech recognition software, proper voice dictation techniques, program commands, simple voice macros and establishing a solid and accurate voice profile.
The final three hours of training helps develop experience and proficiency in the use and development of more complex voice macros to be used in a specific work situation. This training applies to both discrete and continuous packages for individuals with hand injuries or disabilities. For non-injured individuals, six hours is usually enough to get someone going successfully with continuous speech.
Just knowing how to use the speech recognition software is half the battle. Knowing how to integrate speech recognition with existing applications is the other half. The experienced speech recognition instructor can assist the individual to create macros that will help the individual become productive more quickly and suffer less frustration.
The speech recognition companies offer a variety of technical support options. Most of them offer 90 days of free technical support from the date of purchase. After the initial 90-day period, individuals can purchase support contracts directly from the manufacturers. In addition to this technical support, many speech resellers offer custom support packages. Most of the problems associated with implementing speech recognition are resolved within the 90 day period, particularly if a professional evaluation has been performed and proper training received.
This technology can be used to help return injured workers to their computer jobs, potentially reduce the risk of RSI, allow access to a computer for the disabled or technology displaced individual and increase productivity for the non-injured worker. Speech recognition is not just for the disabled anymore!
Go to previous article
Go to next article
Return to 1999 Conference Table of Contents
Return to Table of Proceedings