2000 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2000 Table of Contents


Dr Mike Wald
Director Southern Higher Education Consortium
University of Southampton,
New College
The Avenue
Southampton SO171BG

Introduction Deaf and hard of hearing students may benefit from technologies that assist them in receiving and understanding the information that is being transmitted by speech in lectures, classes & seminars. This information can be useful both in ‘real time’ and as a record after the event. The provision of lecture handouts including diagrams in advance will help the student to follow the lecture and help the student and the transcriber or note taker to prepare (e.g. build individual dictionaries for technical terms, names etc.) If diagrams are not provided in the handouts they will need to be drawn by hand. Remote transcription or interpreting reduces cost and maximises a scarce resource as it can be booked for short periods and does not involve travel costs or time. It does however incur phone charges and needs good quality transmission of the signals.

Real-time verbatim speech to text systems produce the most accurate transcriptions although summarized speech to text systems are capable of providing language modification if required.

This paper will review the issues involved in the use of the various technologies and report on current investigations and trials.

Electronic Text Communications

The digital transmission of text asynchronously through electronic mail or synchronously through textphones or internet chat etc. provides communication opportunities for Deaf people unable to communicate through speech over the telephone. It is also possible to get email read out automatically over the telephone to communicate with hearing people who don’t use email. Textphones allow text-to-text communication as well as text to speech communication through the text relay system. Real time chat over networks can also allow text communication for groups. Unlike fully duplex textphones, chat systems usually require the user to complete their communication before it is sent and appears on the other person’s computer. This prevents natural ‘interruptions’ which can speed up conversation.

Fax allows graphical communication to take place using text, handwriting and pictures.

Remote Sign Language Interpreting

The use of high quality real-time digital video communications on digital ISDN lines or computer networks allows Deaf people to communicate through sign language. There are already systems being trialled using video over the cellular network for wirefree communication. Sign Language Interpreters are a scarce resource with few qualified to work at degree and postgraduate levels with an understanding of the subject at the level it is being discussed. Using high quality videoconferencing it is feasible for the interpreter to work remotely. A minimum of ISDN2 (128 k) is generally required with higher quality (e.g. ISDN6 at 384k) necessary for remote Lip-reading because of the smaller movements involved.

Computer Generated Signs

Computer generated sign language is based on a recording of the digitised face arm, hand and finger movements of a person using sign language. Building up a word to sign dictionary enables word for word signing to be automatically generated from captions. This approach however does not take into consideration sign language grammar. Automatic computer based translation into British Sign Language (BSL) is of course a very much more difficult task.

Real Time Verbatim Transcription Using a Standard Keyboard It is not possible to produce verbatim transcription for normal speaking rates using handwriting or a standard keyboard and so some sort of summarisation must take place even if abbreviation expansion is used to speed up the text entry. It is necessary for those summarising to understand the topic at least as well as the students they are transcribing for. Summarising requires training to record everything rather than only the things the note taker or summariser thinks are important to remember. The telephone text relay system normally uses a standard keyboard with a trained intermediary to change speech into text and text into speech to allow a conversation to take place between a hearing person using speech and a deaf person using a textphone (Wire free cellular textphones can also be used). Since text is being entered in normal spelling by the relay operator using a standard keyboard it is too slow for real time transcription and so the conversation must occur at a slower than normal rate.

Real Time Verbatim Computer Aided Transcription

Deaf and hard of hearing students can be supported using Palantype computer aided transcription which translates into normal spelling the verbatim transcript made by the Palantypist (or stenotypist). A skilled operator using this technology can produce an accurate readable real time text display for a deaf person to enable them to follow live conversation. Palantype was developed at Southampton University in the 1970’s and uses a special phonetic keyboard to provide real time verbatim transcription at speeds of up to 240 words per minute for meetings, conferences, lectures and television captioning. Remote transcription is feasible using combined voice & data modems over standard telephone lines or wirefree using two cellular phone lines for voice and data. Although using one line should be feasible, having two lines has the advantage of not requiring radio microphones etc. for the speaker.

Automatic & Computer Aided Speech Recognition Text Transcription

There has been rapid development of automatic speech to text transcription and accurate large vocabulary continuous speech is now feasible with training of speaker and system. Training all lecturers to change the way they speak in order to improve recognition rates would however not appear a realistic option. Systems now need to develop speaker independence with even better accuracy so transcription can occur without human intervention & correction through shadowing, summarising, or real-time editing.

Human Intervention & Correction To Improve Accuracy

As long as speech recognition systems are seen to display 'absurd' errors to the reader, 'real time' users will be anxious that they are being 'misinformed' and will need to concentrate on trying to interpret errors that are causing confusion rather than on the subject matter of the lecture. Speech recognition systems will therefore need to provide further information to improve the user’s confidence in the accuracy of the transcription. Corrections are difficult to make in real time without the facility to bring up the correction window immediately a recognition error has occurred. Having to select and repeat the utterance before it will appear in the correction window is a much slower and less reliable approach that is also confusing to the speaker as they may try and repeat how the utterance appears on the screen. In many cases the correct version will appear in the correction list and its selection will further train the computer to recognise the way the lecturer normally speaks.

Verbatim Repetition/ Shadowing To Allow Speaker Independent Recognition

Using somebody to shadow or repeat what is being said allows the system to be expertly trained to provide fewer errors. This helps provide speaker independence and means that the lecturer need not be too concerned about their speech. There is still little time for corrections if attempting verbatim transcription although this could be undertaken by yet another person. Some verbatim speech-to-text court reporters use speech recognition to give accurate real time transcriptions with minimal editing. It will be some time however before speech recognition will provide as accurate a transcript in all environments as a verbatim Palantype reporter.

Summarising can be undertaken using speech recognition rather than a keyboard and once again the summariser needs to understand topic and be able to summarise. A shorter summary will give more omissions but can provide more time to ensure fewer errors while a longer summary with fewer omissions may give more errors. Summarising using speech recognition technology systems can be faster than summarising using standard keyboards and so can provide a more detailed transcription.

Remote transcription, where the person doing the shadowing is in a different location to the speaker and student, is possible over computer networks, telephone lines or wire free cellular networks. This approach has the advantage that no noise-reducing mask is required, there is a minimum of disruption to the class or session and a more powerful desktop computer can be used rather than a laptop.

Two Way Communication

Technologies should also allow two-way communication so that deaf and hard of hearing students can communicate and participate fully in educational activities.

Quality of Transcription

Factors affecting the quality of the transcription include speed, delay time, accuracy, readability, wrong words, missed words, punctuation and indication of change of speaker.

Training and Language Issues

Those involved in computer-aided speech to text transcription require training to use appropriate technologies and understand the language requirements of deaf students.

Go to previous article 
Go to next article 
Return to 2000 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.