2004 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2004 Table of Contents 


Ted Wattenberg, M.S., C.R.C.
Adaptive Technology and Access Coordinator
Adjunct Faculty in assistive technology
San Joaqin Delta College
Stockton, California
Phone: (209) 954-5382
Email: twattenberg@deltacollege.edu

Doctorial student in Computing Technology in Education
Nova Southeastern University
Department of Computer and Informaiton Science
Fort Lauderdaly, Florida


Besides people with low-vision, people with limited-vision, learning disabilities, mobility impairments, English as a second language, the elderly, and people with reading delays can use text-to-speech technology as an assistive technology (Mull & Stilington, 2003; Lines & Hone, 2002). It is estimated that five to twenty percent of postsecondary students in the United States have learning disabilities, representing over one-third of postsecondary students having disclosed a disability to receive supportive assistance (Harrison, 2003). With the growing numbers of college students with reading problems that can be assisted with the use of screen reader technology, it is important for educators responsible for the evaluation, assessment, and instruction of assistive technologies in the postsecondary environment to understand the usability problems associated with their use in order to provide effective instruction (Scherer, 2000).


The purpose of this usability evaluation was to test the intelligibility, comprehensibility, and user affect of improved synthesized voices that more closely resemble real human speech used in screen readers and MP3 text conversion devices and to explore instructional strategies for teaching people to use screen readers within a post-secondary environment.


There are two main functions that synthesized text to speech technology can provide students with disabilities affecting delays in reading; comprehension of the reading content and aiding in improvement of neuro-cognitive processing that promote independent speech and higher order cognitive processing (Blischak, 1999; Higginbotham & Baird, 1995; Mull & Stilington, 2003). The learning ability of a person using synthesized text to speech is effected by three constructs; the intelligibility of words, the understanding of phrases, and their affect to the synthesized voice (Higginbotham & Baird, 1995). Significant improvements have been found between the learning outcomes resulting from the use of real human recorded voices compared to computer generated synthesized voices that resemble human speech. The higher cognitive perceptual processing of information needed in reading comprehension is more difficult for people trying to interpret synthesized speech than in processing a natural human voice. There have also been improvements of speech performance of people with severe speech impairments and reduced cognitive overload for people with learning disabilities after continued use of synthesized speech technologies (Blischak, 1999). This improvement is increased when the synthesized voice is perceived as more natural and more closely resembling that of real human speech.

The testing took place in the High Technology Classroom at San Joaquin Delta College, Stockton, California, on June 18, 2003. The classroom is equipped with twenty accessible student workstations on Pentium III, 800 MHz. Computers. Sound boarding, installed on three of the exterior walls, reduce classroom noise caused by other students or computer equipment. Labtec headphones are supplied for each workstation that can be used for either screen readers or voice activation applications

ReadPleasePlus 2003 is a low-cost screen reader designed for people with learning disabilities, mobility impairments, and low-vision. While, it is not considered one of the most advanced screen reader products on the market today that offer dictionaries, thesauruses, word prediction and other useful learning aids, ReadPlease Plus, because of its low-cost is widely used. ReadPleasePlus 2003 offers a low-cost voice upgrade to the new ATT 16 MHz. Normal Voices (AT&T Corporation, 2001). These products are availability to people with disabilities and the voices transferability to other screen reader products.

Learning Theory

Two theories used in special education are based on Vygotskian Learning Theory and Cognitive Learning Theory (Jamieson, 1994; Lyon & Krasnegor, 1999). There are many similarities between these two approaches. Both break up the learning process into three to four sequential steps that if disrupted can cause lower the retention of information, hinder logical processing, or prevent creativity. The three steps of the learning process are: 1) intake of information that reflects real and culturally derived ideas that can be understood by the learner 2) the transfer and processing of information between memory locations of the brain, often supported with collaborative and shared learning; 3) the reflective process that transforms information into individual ideas and creative forms that can be shared with others. The use of screen readers has been shown to impact all three of the learning stages; intake of information, neuro-processing, and the communication of ideas.

Synthesized Speech and Learning

Understanding and comprehending human speech is dependent on two cognitive processes; the ability of somebody to understand individual words and their ability to comprehend the meaning of spoken phrases (Cahn, 1990; Morton & Tatham, 1996; Lai, Wood &, Considine, 2000). Cahn (1990) termed the quality of synthesized speech to render words that are understood as intelligibility and the quality of producing comprehendible phrases as comprehension. These terms have been used extensible within studies as metrics determining usability of synthesized speech products.

Intelligibility is based on the correct phonetic representation of all of the sounds of a particular language and the ability of the software algorithm to combine them correctly into understandable words (Lai, Wood &, Considine, 2000). Text to speech algorithms are rule-based conversions that are performed before the oral rendition is delivered and are typically one or combinations of either format-based, articulation-based, or concatenate-based. These conversion systems are capable of providing oral translations of text unlimitedly, without the necessity of large amounts of storage and processing memory and are used in reading e-mail and other e-text materials.

Comprehension of phrases produced by synthetic speech relies on the listener's ability to understand words and combine them into an affective message (Cahn, 1990). The flow of orally spoken phrases consists of intonations, rhythms, and syntax parameters that the listener learns from hearing a new voice. The listener's affect is the level in which they are able to learn how to interpret the orally produced sequences to actually comprehend the content and emotional messages. Intelligibility and comprehension of short phrases must be completed fast enough to prevent cognitive overload within working memory (Lai, Wood &, Considine, 2000). Over longer periods of reading, which necessitates two or more read phrases, the persistency of the listener's affect must be sustained over time, requiring the comprehended information to be stored and retrieved from long-term memory locations, processed within working memory, and then stored again in long-term memory. Strong correlations between intelligibility, comprehension, and persistency have been found in the use of synthesized speech products (Reynolds, Isaacs-Duvall &, Haddox, 2002). Additionally, improved comprehension and persistency have been measured in real human voices compared to synthesized voices. The more natural and real the synthesized voice is perceived by the listener, the greater the comprehension and persistency of the synthesized voice.


The results of this report identify several factors that should be included in assistive technology evaluations/assessments when the student may be requiring the use of a screen reader to assist in reading.

  1. Dominant learning modalities of the student must be measured and assessed to see if an oral modality is compatible.

  2. Even when oral learning styles are immediately conducive for the student, a learning curve of several weeks to months should be expected.

  3. An assessment of prior successful learning strategies should be completed to see if the student already knows how to learn or if previous attempts to learning have been unsuccessful.

  4. Oral vocabulary levels should be determined to ascertain if intelligibility is immediately possible.

  5. Attention and memory disabilities should be identified and determined if the oral modality will support or confuse the student.

The results of this report identify several usability problems that can be lessened or eliminated by training and practice with curriculum in the following areas:

  1. Sufficient time should be allotted for the student to become familiar with the synthesized voice to be able to reach higher levels of affect.

  2. The student should be informed that learning to use a screen reader takes time and to expect a learning curve between a few weeks and several months.

  3. Initial oral vocabulary building is essential to increase intelligibility.

  4. Practicing with short phrases that can be understood within working memory can strengthen comprehension.

  5. AT first, factual content is easier to comprehend than emotional content. Begin with phrases and short passages that are factual based and progress over time to passages that include emotional content.

  6. Greater levels of affect can be reached when the speed of speech is increased, also reducing cognitive overload. The reading speed should be slowly increased in proportion to the improvement in intelligibility and comprehension.

  7. Teaching strategies should focus on integrating oral learning into the student's learning style and learning strategies.

Further research

The use of screen readers to assist people with disabilities can help many people complete college. Further research is needed in how to integrate oral learning modalities with a student's learning strategy, even if oral learning is not presently a dominant learning modality for them. Additionally, it is necessary to know more about how oral reading affects higher order cognitive thinking, creativity, and imagination.


AT&T Corporation (2001). AT&T labs natural voices text-to-speech engines. Retrieved June 18, 2003, http://www/naturalvoices.att.com

Blischak, D. M. (1999, Fall). Increases in natural speech production following experiences with synthetic speech. Journal of Special Education, 14(2), 44-53.

Cahn, J. E. (1990, July). The generation of affect in synthesized speech. Journal of the American Voice I/O Society, 8(1), 1-19.

Harrison, S. (2003, Spring). Creating a successful learning environment for postsecondary students with learning disabilities: Policy and Practice. Journal of College Reading and Learning, 33(2), 131-145.

Higginbotham, J. & Baird, E. (1995, June). Analysis of listener's summaries of synthesized speech passages. AAC Augmentative and Alternative Communications, 11(1), 101-112.

Jamieson, J. R. (1994, March-April). Teaching as transaction: Vygotskian perspectives on deafness and mother-child interaction. Exceptional Children, 60(5), 434-450.

Lai, J. Wood, D. & Considine, M. (2000). The effect of task conditions on comprehensibility of synthetic speech. In (Ed.), Proceedings of the ACM CHI'2000 Conference on Human Factors in Computing Systems (The Hague, The Netherlands, April 1-6, 2000) (pp. 321-328). New York, NY: ACM Press.

Lines, L. & Hone, K. S. (2002). Older adults' evaluation of speech output. In J. A. Jacko (Ed.), Proceedings of the ACM Assets 2002 Conference on Assistive Technology (Edinburgh, Scotland, July 8-10, 2002) (1 ed., pp. 170-176). New York, NY: ACM Press.

Lyon, G. R., & Krasnegor, N. A. (1999). Attention, Memory, and Executive Function (2nd ed.). Baltimore, Maryland: Paul H. Brooks Publishing Company.

Morton, K. & Tatham, M. (1996). Natural voice output in interactive information systems. Proceedings of the Institute of Acoustics, 18(1), 1-6.

Mull, C. A., & Stilington, P. L. (2003, Spring). The role of technology in the transition to postsecondary education of students with learning disabilities. The Journal of Special Education, 37(1), 26-32.

Reynolds, M. E., Isaacs-Duvall, C. & Haddox, M. L. (2002, August). A comparison of learning curves in natural and synthesized speech. Journal of Speech, Language, and Hearing Research, 45(4), 802-811.

Scherer, M. J. (2000). Living In the State of Stuck (3rd ed.). Cambridge, MS: Brookline Books.

Go to previous article 
Go to next article 
Return to 2004 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.