1993 VR Conference Proceedings

Go to previous article 
Go to next article 
Go to Table of Contents for 1993 Virtual Reality Conference 

The Enhancement of Interaction for People with Severe Speech and Physical Impairment through the Computer Recognition of Gesture and Manipulation

David M. Roy 1,2.
Marilyn Panayi 3.
William S. Harwin 1.
Robert Fawcus 2.

1. Applied Science and Engineering
Laboratories, A. I. duPont
Institute/University of Delaware, USA.
2. City University, London, UK.
3. Arts and Educational Consultant,
London, UK.

David M. Roy
Applied Science and Engineering
A. I. duPont Institute
P.O. Box 269
Wilmington, Delaware USA
Tel 302 651 6830
Fax 302 651 6895
E-mail: roy@asel.udel.edu


The research involves using drama and mime to elicit intentional behaviors such as gesture and manipulation from people who cannot access a keyboard directly due to physical impairment. Although the resulting movements are often imprecise and idiosyncratic, attempts are being made to recognize them by computer. Sensors are being used to monitor continuous signals from the body. Feature extraction and classification techniques involving artificial neural networks will be used to recognize intention. It is envisioned that the final system could be used to facilitate interaction with a selection of multi-sensory real and virtual worlds. As part of a carefully planned early intervention strategy the system could create the opportunity for the child to learn through physical interaction and exploration.


"Great is the human who has not lost his childlike heart." -Mencius (Meng-Tse), 4th Century B.C.

Research Objectives:

The objective of the research is to develop new methods to enhance the expressive communication of people with severe speech and motor impairment due to cerebral palsy. Existing methods of augmentative and alternative communication (AAC) only harness a small fraction of the multi-modal expressive ability observed in this population.

Attempts are being made to elicit intentional behavior such as gesture and manipulation using drama and mime. Computer recognition will be used to facilitate a new method of human-machine interaction that closely matches the abilities of the target population (1).

What is being developed?

i) A prototype technique for the elicitation of intentional behaviors including gesture and manipulation.

ii) A prototype system that will use multiple biomechanical and bioelectrical sensors together with feature extraction and pattern recognition to identify intentional behavior.

Cerebral Palsy:

People with severe speech and motor impairment due to cerebral palsy have limited and imprecise motor control. Each member of this population is likely to have an idiosyncratic range of physical abilities. Cerebral palsy is a result of damage to the developing brain usually during or near birth. The damage results in a loss of control over voluntary muscle action together with inappropriate muscle tone (2). If the condition is severe, precise targeting or activities involving fine motor control are very difficult. Existing electronic assistive devices for communication or computer access rely on the ability to find parts of the body, e.g. head, elbow, hand, knee, or foot, that can reliably target one to five large electromechanical switches. Thus, there is a mismatch between the user's ability and the method of device access.

Investigating The Abilities Of The Target Population

Familiar communication partners are frequently able to visually discern intention from imprecise and idiosyncratic gestures. Our findings were in agreement with the work of others in the field of AAC that suggests that this population have multi-modal communication skills which are currently not harnessed by their existing communication devices(3,4).


Preliminary observations were conducted during classroom and therapy sessions while students followed their normal school schedules. The twelve students that participated ranged in age from 8-17 years. The students' communication was often multi-modal involving a combination of facial expression, eye-gaze, vocalization, dysarthric speech, and upper-extremity gestures including the head, arm, hand, and upper torso. The AAC systems used by the students included a variety of language boards and/or electronic devices with synthetic speech. The students participating in this study lacked naturally occurring structured environments in which they could develop gestural skills.

Elicitation Of Intentional Behavior

Why Use Drama and Mime?

The rationale for using drama and mime was based on the need to create a framework for the production of intentional behavior that had no immediate function. These media are conducive to a child-centered approach where the student's imagination and creativity can be engaged. Age-appropriate dramatic devices can be selected to match the abilities of the students. These devices include puppetry, story scenarios, character formation, and object visualization. The value of adapted play and use of creative arts in educational and therapeutic settings has been highlighted in the work of Musselwhite et al. in the USA (5), and work documented by Segal in Europe (6).

Elicitation of Gesture and Manipulation:

Each student participated in a gesture elicitation session lasting an average of 40 minutes. The Student formed part of a team with their therapist in a charade game. The investigator played the role of a "game-show host" and asked the student to produce a mime in response to words or phrases from a pool of 120 mimes. Examples are ice-cream, ironing, violin, spank, stroke the cat, helicopter, heavy weight, light feather, mosquito bite, and rainbow. Where appropriate, a puppet was used to engage the younger students. "Rules" of the game were designed to ensure that any response was acceptable as long as the student had an intended behavior in mind. Feedback to the student was always positive. This ensured that the student was not exposed to the feeling of failure during any part of the session. Verbal questioning was used to elicit the student's yes/no response. This provided feedback to the investigator as to the nature of the intentional behavior, in this case the mime. Sessions were video-recorded for subsequent gesture analysis.

Texturally interesting objects and toys were used to augment the elicitation technique with students who had exhibited manipulative ability during the initial drama and mime sessions. The investigators are developing manipulative interfaces that may be able to harness these behaviors using strain gauge sensors and force-sensitive resistors (7).

Preliminary Findings

The students produced gestures that were consistent in concept across sessions. For example, the mime for umbrella involved holding the hand stationary at or above head level, the mime for ice-cream involved bringing the hand close to the mouth combined with a licking action. Minimal prompting in the form of clues were necessary from either the therapist or the investigator. The ease of elicitation and consistency of concept over time suggested that existing kinaesthetic abilities were being harnessed involving a low cognitive load. The students were able to maintain motivation throughout the 40 minute session. Mimes were spontaneously enacted often with a sophisticated and creative appreciation of movement in time and space. The students were able to convey concepts for weight (8), emotion, character formation, and object visualization. Frequently, mimes and gestures from students at different schools in separate states appeared to originate from a common concept e.g. the mimes for rainbow and snake. Therapists reported that the students had little or no recent exposure to activities likely to elicit similar behaviors. This suggested that these abilities had been acquired without practice or training.

Sensing And Recording Of Biomechanical And Bioelectric Signals

The authors' approach to harnessing intentional behavior is to use multiple sensors that output continuous signals.

The rationale for using multiple sensors is as follows:

  1. It may be possible to increase recognition accuracy if signals that contain information about the user's intention are available from more than one source.
  2. The approach facilitates multi-modal access.
  3. A single sensor design is unlikely to be appropriate for all users due to the heterogeneity of the target population.
Physiological parameters that are useful for computer recognition of gesture: For the purposes of this investigation, physiological parameters can be grouped into two categories:
  1. Parameters that vary as the intentional behavior is produced, e.g. position, velocity, acceleration, force, electromyogram (EMG), electroencephalogram (EEG).
  2. Parameters that can be used to predict the "quality" of production of the gesture from the physiological state, e.g. measures of emotional and fatigue state such as heart rate, respiration rate, blood pressure, galvanic skin resistance, or chemical composition of perspiration.
Figure 1 illustrates the concept of sensor fusion.The diagram shows a stylized human body with arrow-labels that indicate some of the sources of information from the body which can be monitored using sensors. Sensors on a head-band and wrist-band record head and arm position and orientation. EEG from surface electrodes on the scalp. EMG from surface elctrodes on the arm. Strain-gauge sensor to measure grip-force. Microphone to capture vocalizations. Camera to monitor facial expression. Pulse-rate captured using surface electrodes.

Gestural Data From A Single Upper Extremity

Five students were selected for sessions involving the collection of data using sensors. The gestures were selected on the basis that they a) reflected students' physical abilities, and b) involved one arm as a principle component in the mime. A sub-set of twelve gestures was chosen in the case of one student and 27 gestures for the remaining four students.

Arm motion was monitored using the "Bird" magnetic tracker from Ascension Technology, California, USA. The system tracked position and orientation in three dimensions (six degrees of freedom). The receiver was attached to the wrist on an elastic wristband. This device monitored position and movement of the "mime" for one arm.

Surface EMG electrodes connected to a Physiotech 4000 system recorded muscle activity as a measure of the "weight" of the mime (equivalent to agonist-antagonist muscle co-activation). The muscles monitored were the biceps brachia, triceps brachia, deltoid and pectoralis major. The data was recorded on the hard disk of a 50Mhz 486 based PC with 20MBytes RAM at 100 records per second.

The position of each student's wheelchair relative to the "Bird" transmitter was kept constant across sessions. This was achieved using a timber rig that was designed to accommodate a range of wheelchair sizes. Data was collected from each student during three separate 40 minute sessions within a five-day period. Each session was divided into approximately four ten-minute blocks. A minimum of twenty examples of each gesture was recorded. Fatigue of the student was accommodated where necessary. Each session was videotaped.

Figure 2 shows a student performing gestures wearing position and EMG sensors on one arm.

Proposed System Architecture

It is envisaged that a single computer recognition system could accommodate a range of sensors after Roy et al. (7). Figure 3 shows a schematic of the proposed system architecture. The recognition system will consist of a preprocessor, an artificial neural network and a supervisory level. The preprocessor has a data reduction and signal enhancement function. Extracted features are presented to the artificial neural network classifier. The outputs from the neural network are used as the controlling inputs of the application. The human-machine interface will include visual, auditory and tactile feedback that facilitates user control. Where appropriate this may include a direct representation of physiological parameters. The supervisor will constantly monitor the performance of the user with the device. It will accommodate to the fatigue state of the user and adjust its parameters accordingly. Periodically it will report the recognition rate to the user and recommend that it is retrained to compensate for changes in long-term user-performance. Data recorded during recent use will be used to retrain the recognizer.

Discussion/Future Work

Elicitation of Intentional Behavior:

Preliminary findings presented in this paper suggest that kinaesthetic intelligence as described by Gardner (9) is present in this group of students despite indicators that they have limited access to environments in which they can express this intelligence.

The importance of providing creative, stimulating, and interactive environments for young children as a way of enhancing pre-language skills including gesture is well documented by Piaget (10).

The existence of gestures at neo-natal stages of development suggests that gestural human-machine interaction may offer the opportunity to develop early intervention strategies (11,12). Such interactions may contribute to the enhancement of physical and cognitive skills of children with severe speech and motor impairment.

Computer Recognition:

Although the field of computer recognition of gesture is still in its infancy, research in this area is encouraging. Harwin and Jackson developed a system that recognized head gestures of people with cerebral palsy using a six-dimensional position tracker (13). The system used the concept of a "virtual head stick" and used hidden Markov models as the recognition method. Their work indicated that one of the major challenges is in determining the start and end of each gesture.

The collected data will be used to develop gesture recognition algorithms based on artificial neural networks (ANN). Equally important at this stage is the examination of the information content of the gestures themselves. The aim will be to use ANNs to:

  1. Identify the importance of parameters in the recognition of gestures from people with cerebral palsy.
  2. Examine the value of sensor integration e.g. the combination of EMG and position sensors.
  3. Guide the future selection of gesture sub-sets in order to obtain a high rate of recognition.

Interaction in Real and Virtual Worlds:

The elicitation process has primarily consisted of interactions between the student, therapist and investigator. The authors intend to explore ways in which this process can be enhanced by providing interactive artificial environments using non-immersive artificial reality to elicit intentional behaviors.

Recognition of intentional behaviors in an artificial environment may offer an opportunity for early exploration in a world that is not restricted by a child's physical limitations. Authoring systems could be used to create artificial worlds containing eliciting agents that stimulate educational, therapeutic, and creative behaviors.

The authors aim to extend the techniques under development to other populations. In particular students with traumatic brain injury and students that are cognitively challenged. The motivating elements of a multisensory artificial environment may be particularly useful as a tool for engaging the interest of these students.

It is envisaged that the techniques described in this paper will contribute towards the development of a method of human-machine interaction that is more physically and sensually appropriate for certain populations of people with disability. The preliminary findings presented in this paper demonstrate that kinaesthetic intelligence is present in a group of students that have severe speech and motor impairment despite indications that they have limited access to environments in which they can express and develop this intelligence. Gestural human-machine interaction in real and virtual worlds may provide an opportunity for physical self-expression that is not available through other traditional access methods.

"Those who dream by day are cognizant of many things which escape those who dream only by night."-Edgar Allan Poe, "Eleanora"

Financial Support

The authors would like to gratefully acknowledge the financial support of the Nemours Foundation, USA, the Science and Engineering Research Council, UK, and the Disability Program of the British Computer Society.


Special thanks to Dr. Michael Floyd, Professor Ewart Carson of Systems Science, City University, UK, Dr. Richard Foulds of the Applied Science and Engineering Laboratories, USA for their continued support and encouragement.

Particular thanks to student collaborators, their care-givers, and staff of John G. Leach School A. I. duPont Hospital School, Delaware, HMS School for Children with Cerebral Palsy, and Widener Memorial School, Philadelphia, PA for their interest and commitment.


Go to previous article 
Go to next article 
Go to Table of Contents for 1993 Virtual Reality Conference 
Return to the Table of Proceedings 

Reprinted with author(s) permission. Author(s) retain copyright.