2004 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2004 Table of Contents 


VOICE RECOGNITION TECHNOLOGY, PART FIVE: ADVANCES AND NEW DIFFICULTIES

Presenter(s)
Dr. Robert H. Paine
Professor of Chemistry
Department of Chemistry
College of Science, Rochester Institute of Technology
85 Lomb Memorial Drive
Rochester, New York 14623-5603
Phone: 585-475-2516
Fax: 585-475-7800
Email: RHPSCH@RIT.EDU

This presentation describes the experiences and challenges encountered while using voice recognition technology (VRT) for instantaneously creating captioning on videotapes of chemistry lectures in a live classroom setting (face to face - F2F) which are to be used for distance-learning (DL) courses. Parts One - Four were presented at these conferences in 1999 -2002, respectively. The following describes the work that has been accomplished in the past few months; also discussed herein is the interaction of F2F pedagogy linked with that of DL methodology (often under the umbrella of On Line Learning), now bearing the name "blended" learning.

Distance Learning as a methodology has been in vogue at Rochester Institute of Technology for more than 23 years and has gained wide acceptance throughout educational institutions globally. Because of Rochester Institute of Technology 's early entry in this innovative pedagogy, videotapes and compact disks have become a welcome, non-intimidating means of presenting lecture material in Chemistry and other courses. It is the intent of this work to add instantaneous captioning to this present method of presentation of Distance Learning and to further Rochester Institute of Technology's leadership in Distance Learning pedagogy and practices.

Distance Learning is often described as education "anywhere, any time," which usually means bringing the course material to the student, rather than the student to the course. Today and well into the future, this particular definition suits the part time student who is gainfully employed and is striving to gain his/her degree concurrently. Since this work began, the DL mode of presentation has been embraced by full time students, as well, solving many course scheduling problems. Videotapes, CD's and ancillary functions make this quite feasible; the student is more able to fit study hours to his/her own schedule rather than a rigorous class schedule. However, as we view these techniques, it is readily apparent that the methodology can be utilized by all students, full-time, part-time, co-op anywhere in the world. The financial pressures and burdens of collegiate education have presented us with a situation where more than 60% of the full time are working part-time to help pay ! for their college education. Therefore, as the evolving technologies now embrace significantly larger groups, I would like to suggest the following somewhat broader definition for Distance Learning: "Distance Learning embraces those technologies which are utilized for educational processes whenever the student and the professor are separated by distance and/or time."

Our DL programs have been and continue to be successful student learning experiences for the following reasons:

  1. The student can study any time he wishes.
  2. The tapes can be rerun, stopped or repeated to match the students' note taking and comprehension rates.
  3. Videotape material is not as intimidating as a formal lecture might be.

Rochester Institute of Technology (RIT) is a comprehensive, independent technological university, which is predominantly a teaching institution, that enrolls over 17,000 students in a wide range of undergraduate and graduate programs. Founded in 1829, Rochester Institute of Technology has throughout its history been known as a university that is committed to technology-based education that has meaningful application to industry and the community at large.

The University is comprised of seven colleges: the College of Applied Science and Technology, the College of Business, the newly formed B. Thomas Golisano College of Computing and Information Sciences, the Kate Gleason College of Engineering, the College of Imaging Arts and Sciences, the College of Liberal Arts, the College of Science, and the National Technical Institute for the Deaf (NTID). For many years the National Technical Institute for the Deaf has been situated at Rochester Institute of Technology; Rochester Institute of Technology is considered a world leader for educational support of the hearing impaired. More than 1,100 deaf students from across the United States as well as from several U.S. territories and other countries study and reside at Rochester Institute of Technology each year. (Rochester, NY represents the largest collected population of deaf and hearing impaired people in the world). The National Technical Institute for the Deaf provides Rochester Institute of Technology's deaf students with technical and professional training in over 30 programs. And the National Technical Institute for the Deaf education prepares students for technical careers in areas such as applied accounting, applied art and computer graphics, applied computer technology, engineering technologies, ophthalmic optical finishing technology and photomedia technologies, to name a few. Most of these programs require proficiency in fundamental chemistry.

Since 1977, Rochester Institute of Technology has been offering a variety of college courses for Distance Learning students, employing multi-media techniques, including creation and presentation of lecture materials via videotapes. Beginning in 1992, the Department of Chemistry has prepared Chemistry courses (Chemical Principles I & II, Fundamentals of Chemistry, Introduction to Chemistry of Materials, Introduction to Organic Chemistry, Organic Chemistry I, Biochemistry I, Biochemistry: Conformation and Dynamics and Biochemistry: Metabolism, and most recently, College Chemistry) for use in various Distance Learning and campus (F2F) programs. This is an expanding effort and more Chemistry courses offered by this methodology are planned. Let me digress for a moment. The most often asked question is 'how do you handle labs with a DL type course?' Whether the particular course under consideration is a campus course, or one being offered at some other site, the students enroll in the desired "Chemistry 'power' LabŪ. The exact lab program usually offered weekly, is offered as a two and a half day sequence e.g. Friday and Saturday, 8.m. to 4 p.m. and Monday following, from 8 a.m. to 12 noon. These are offered on the RIT campus as listed in our course instructional bulletin.

Many National Technical Institute for the Deaf students, as well as hearing-impaired students throughout the world, desire and need courses in College Chemistry (listed above). For these students to avail themselves of Distance Learning courses in Chemistry, since there has been no closed captioning on these tapes, there are two alternatives presently available: the student must hire an interpreter to translate the audio portion of each tape; or preparation of exact scripts for closed captions must be prepared well in advance and executed, or if after the fact, an exact synchronization with each tape is required. Both of these methods are expensive and lengthy. To overcome these disadvantages, efforts have been directed toward conversion of voice signals to printed alpha-numeric. This project will examine the voice recognition technologies available (IBM et al.) to convert voice signals into visual words which appear instantaneously on a television monitor and are simultaneously printed on the videotape (with streaming video as a future next step) as they are produced. Successful accomplishment of this means that all study materials will be available to hearing impaired students right away, rather than after a two to three week delay. These techniques will also be utilized to close caption Chemistry tapes which have already been produced.

It is recognized that the Science of Chemistry, like all mini-cultures, has developed an argot which may require specialized software preparation to complete these goals. For example, the spoken syllables "aitch two ess of four" will have to be translated and printed as "H 2 S 0 4"; "see aa see ell two" as C a C 1 2" and so forth.

The Department of Chemistry here at RIT initiated DL with foundation chemistry courses supporting the Electrical/Mechanical Technology BS degree program in 1992.

The success of this program allowed us to add it to the RIT Summer program; in this instance, students rented the tapes for viewing at home, and came to campus for weekly recitations. The following year these programs were offered at several community college sites in western New York. As the enrollment of these classes grew, it was inevitable that some hearing-impaired students would register; captioning for this portion of our student body became requisite. The usual methods of scripting and captioning were both time consuming and expensive.

The original project, instantaneous closed captioning of videotapes, used with Distance Learning Chemistry courses by voice recognition technology, was conceived to save both time and money while maintaining high quality productions. As the initial work has proceeded, some unexpected pedagogic synergism has appeared.

One goal of the Department of Chemistry at Rochester Institute to Technology, and of the Institute itself is the advancement of Chemistry as a major science throughout the world. As part of this goal, this paper describes a logical but significantly unique step in improving the educational effectiveness of Distance Learning courses in Chemistry.

Many National Technical Institute for the Deaf students desire and need courses in College Chemistry (listed above). For these students to avail themselves of Distance Learning courses in Chemistry, since there has been no closed captioning on these tapes, there are two alternatives presently available: the student must hire an interpreter to translate the audio portion of each tape; or preparation of exact scripts for closed captions must be prepared well in advance and executed, or if after the fact, an exact synchronization with each tape is required. Both of these methods are expensive and lengthy.

Over the past several years, some modest experimentation has proceeded with great success, delight and some revelations. One such experiment is described: A group of students were gathered in a TV studio/lecture room (four hearing-impaired students and sixteen others: and were told of the experiment. The room was prepared for Chemistry Lecture Demonstrations and all proceedings were recorded on videotape. A large TV monitor was available so students would see the live action, but where captioning could be superimposed, as well as on tape. Two phone lines were available, linked to a stenotypist-captioner 40 miles distant. The first phone line was to carry the voice of the Professor (RHP) to the captioner, the second for the return of the captioning. One half hour before the start, a list of technical chemical terms was sent by Fax to the typist-captioner. Present also was an American Sign Language interpreter. With only two mistakes, this hour experiment succeeded more than expected.

The return of the captioning was almost instantaneous and none of the students watched the Professor or the interpreter; all were glued to their images on the screen! The students asked if they could expect this every session! (Total costs for this hour: $400.) What we observed in this live presentation is that hearing-impaired students (as well as all others present) no longer had to divide their attention: glance at the interpreter, then glance at the instructor - back and forth - through the entire lecture. The students' attention was now directed to one single place - the TV monitor - where words and actions were simultaneously displayed. These students exhibited a degree of concentration not previously seen in our "regular" presentations. And further, as our original project planned, there will be significant time and dollar savings. However, our delight is that we see a significant improvement in teaching effectiveness, and frustration and intimidation have been minimized.

Further examination of these techniques reveals that the audience served by closed captioning is much greater and includes:

  1. Hearing-impaired students
  2. Dyslexic students
  3. Students who are slow readers or slow note takers
  4. Students for whom English is a second language

This presentation will describe the efforts and successes for instantaneous captioning that have occurred in the last twelve months. Following this, I will speculate on our future course of action and efforts. Many areas are now engaged in Automatic Speech Recognition (ASR). These are simple, such as short verbal commands into your telephone, slightly longer commands used to control computer functions; in 2001, Cadillac owners will be able to push a button and then use voice-activated commands to access their e-mail. On the other hand, some companies have retailed very extensive software, designed to recognize many spoken words, often to print them out on a computer screen. These are very helpful for users in those categories described earlier. However, there is less effort being focused on converting the output of these software programs into the printed word, and more specifically into captions on tapes.

After examining the available voice-recognition software, the Dragon Naturally SpeakingŪ software, Professional Version, was selected. One chief advantage of this software was its extensive (20,000+-word) vocabulary As a portion of our work, we look at all new software versions, to see if we are using the best for the instantaneous captioning (patent pending). All the courses mentioned earlier have been created in a recording studio, with direct connections to the digitizing and captioning equipment. Our newest trials have been to take these techniques, away from the studio into a lecture hall. A video camera is placed toward the rear of the hall; the speaker wears a boom style cordless microphone; and where appropriate an opaque viewer (Elmo) is used. All inputs are simultaneously displayed through an overhead imaging system, and also sent by fiber optics, back to the studio control room. The audio and video signals are separated and digitized; the audio signal proceeds through the captioning device, is reunited with the video, and returned to be displayed on the screen in the lecture hall. (About 1 -2 seconds delay). The current course, Chemical Principles I, is being used for these trials. The class consists of 75 hearing students and 5 hearing impaired students; an ASL interpreter is present. The results of these trials will be presented, as well as student evaluations of these experiments of Chemistry pedagogy. They will include the expected, the unexpected and the thrills which occurred during these courses. Following this, I will speculate on our future course of action and efforts.


Go to previous article 
Go to next article 
Return to 2004 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.