2004 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2004 Table of Contents 


Marikka Rypa, Ph.D.
Kevin Erler, Ph.D.
Brent Robertson, M.A.Sc., M.B.A.
Automatic Sync Technologies
Email: Info@automaticsync.com

1 Closed Captioning: Addressing the Need

Closed captioning of popular media such as video and web-based instructional and televised broadcasting services has been a great boon to the hearing-impaired community. The benefits of closed captioning have also been shown for other disadvantaged groups such as limited English proficiency (LEP) populations as well as for adults or children learning to read.[1] Although federal regulations have continued to foster access to media for the disabled (e.g., the American Disabilities act of 1990, and the FCC Telecommunications Act of 1996[2]), obstacles to full captioning coverage and availability remain. High costs and prohibitive turnaround times for closed captioning services have hampered efforts to provide the hearing-impaired with universal access to video resources.

New and innovative techniques for automating the production of closed captioning show great promise in the elimination of these obstacles. It is possible to mine speech processing techniques to develop an automated web-based system that accepts the electronic submission of pre-recorded video or webcast content and automatically returns closed captioning results. By automating a process that is still largely manual, such a system can drastically reduce turnaround times and significantly decrease costs.

2 Reaching a Wider Community of Users

2.1 The Hearing-Impaired Population

According to a report by the National Center for Health Statistics[3], there are currently over 34 million Americans who are hearing-impaired. Individuals with hearing problems are limited in their access to a wide range of informational, educational, and entertainment resources.

Access to training, employment, and career advancement opportunities is similarly limited. Furthermore, as a result of the aging population, it is estimated that between 1990 and 2050, the number of hearing-impaired Americans will increase at a faster rate than the growth of the population.[4]

2.2 The Educational Benefits of Closed Captioning

2.2.1 Students with Learning Differences

Video and captioning are powerful educational tools for students with disabilities when effectively integrated into instruction. Increasingly, educators are experimenting with video and captioning techniques to bolster literacy skills in students who are deaf and hard of hearing and/or who have learning disabilities.[5]

Closed captioning also benefits a wider audience; research cited by the National Center to Improve Practice in Special Education[6] conducted with deaf students, students reading below grade level and students learning English as a second language has shown that captions play a strong role in improving language skills for all groups.

2.2.2 Universal Design and Online Learning

Captioning is also pivotal in the concept of universal design[7] in education, which seeks to make educational environments as usable as possible by as many people as possible regardless of age, ability, or situation.

Universal design does not imply one optimal solution for everyone, but rather reflects an awareness of the unique nature of each learner and the need to accommodate differences, creating learning experiences that suit the learner and maximize his or her ability to progress. Universal Design shifts previous assumptions about teaching and learning in four fundamental ways:[8]

  1. Students with disabilities fall along a continuum of learner differences rather than constituting a separate category
  2. Adjustments for learner differences should occur for all students, not just those with disabilities
  3. Curriculum materials should be varied and diverse including digital and online resources, rather than centering on a single resource.
  4. Instead of remediating students so they can learn from a set curriculum, curriculum should be made flexible to accommodate learner differences

Captioning technology can promote the inclusion of the increasing number of learners whose backgrounds, skills, abilities/disabilities, and interests do not fit traditional "mainstream" models of learning. By using captioning technology originally conceived to better accommodate students with hearing disabilities, we can enhance the online learning experience for all students.

In offering an alternate representation of core material and a very thorough and convenient way to search for key information, captioning broadens the options for learners in how they choose to interact with information. Automated captioning allows content creators to provide effective methods of high-level semantic querying for research and retrieval of relevant learning material.

3 Speech Processing and Closed Captioning

3.1 The Role of Speech Technology

Advances in automatic speech recognition (ASR) software over the past ten years have allowed researchers to begin to apply ASR technology to the problem of automating the captioning process. Almost all of these attempts[9] use commercially available software to perform recognition on a program audio stream, and still require a constrained speaker or linguistic domain to function at an acceptable level.

As a result, previous efforts have not provided fully automated captioning systems. Instead, they have largely resulted in assistive devices that can help speed the captioning process; the process itself remains essentially manual.

3.2 AST: Advances in Automated Closed Captioning

3.2.1 The Current Process

The following steps represent a brief summary of the process of pre-recorded captioning:

  1. Review the recording and produce a program transcript if necessary.
  2. Segment the transcript into individual captions, with correct timing and positioning
  3. Review the end result for quality and accuracy.
  4. Encode the result into the final media (Beta master tape, DVD, MPEG file, etc.)

A range of commercial software is available to assist in the transcribing and captioning process, but caption editing and timing (Step 2 above) must still be done manually. Using a caption preparation workstation, the caption editor watches and listens to a prerecorded program and then breaks the text into discrete captions; s/he assigns appropriate screen placement to each caption and times the appearance and disappearance of each.

3.2.2 AST Advances

Steps 1 and 4 above are both relatively straightforward and are undertaken no matter what process is employed in captioning. Step 2, the segmentation of the text and the proper alignment of audio and video, represents the most time-consuming and expensive part of the process.

The AST system addresses the effective automation of this part of the process; we have chosen pre-recorded, or off-line captioning, as the immediate focus of our automatic closed captioning work in order to leverage transcription in segmenting and aligning the captions with the audio.

Our system has been designed to accept the electronic submission of the program and its transcript as generated in step 1. It returns a standard-format caption file (for traditional videotape, DVD authoring, or webcasting), which can then be used for subsequent quality review and encoding. In order to create such a system, four major areas of research and development must be addressed.

3.2.3 Four Areas of Technology Focus

There are four key areas of technology focus in our automated captioning system:

  1. Interface and accessibility: major advantages of an automated captioning system lie in speed and convenience. Turnaround time can be dramatically reduced, allowing producers to caption content that is delivered at the last minute. An automated system must be designed to allow users to easily access the system and conveniently retrieve the results of captioning. AST provides web-based interfaces that support simple and rapid access to the automated captioning system.
  2. Caption timing: automating the captioning timing to determine the exact time each word in the program occurs is pivotal. Currently available speech recognition engines do not generate the timing data necessary for audio-text alignment and they do not capture adequately the unconstrained, natural language of general instructional material and television programming. We designed our speech technology from the ground up to address the alignment and segmentation requirements of captioning.
  3. Caption parsing: An automated captioning process breaks the text into presentable pieces that are semantically complete and support easy cognitive processing. AST automated captioning presents material in the two-line format incorporating the type of linguistic guidelines in captioning available today.
  4. Positioning and formatting: positioning the captions correctly on the screen also contributes greatly to understandability, such as separating the components of a multi-person dialog, and placement and font considerations. There is a trade-off between immediate wide-spread accessibility and more highly polished formatting; our current focus is to foster wider access as a response to the range of present needs, but further work in this area continues.

4 Current Capabilities: Examples and a Demonstration

To conclude our presentation, we will show examples to illustrate the range of our captioning work with educational institutions and broadcast television. This includes videos produced by educational television, videos produced by the schools themselves, traditional television content, and webcast classroom lectures. To respond to the needs of various institutions, AST automated captioning can readily move among various types of media, from traditional videotapes to DVDs to streaming electronic media for webcasting.

[1]Website:National Center to Improve Practice in Special Education

[2]See http://www.fcc.gov/cgb/dro/ccrules.html

[3]National Center for Health Statistics report,1997

[4]Wisconsin Self Help for Hard of Hearing People, Inc. Mission Statement. http://www.wi.sshh.org

[5]Koskinen, P.S., Wilson, R.M., Gambrell, L.B. & Neuman, S.B. (1993). Captioned video and vocabulary learning: an innovative practice in literacy instruction. The Reading Teacher, 47(1), 36-43.

[6]Website:National Center to Improve Practice in Special Education

[7]E.g., http://www.cast.org/ and www.design.ncsu.edu/cud/

[8] http://www.cast.org/

[9] http://www.robson.org/gary/writing/cr-speechrecognition.html

Go to previous article 
Go to next article 
Return to 2004 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.