2002 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2002 Table of Contents


Ewa Dominowska,
Deb Roy,
Rupal Patel*,
MIT Media Laboratory
20 Ames Street, Cambride, MA 02142
* Teachers College Columbia University
525 West 120th Street, New York, NY 10027

We describe an adaptive communication aid that uses contextual information from the user's environment to predict vocabulary and automate symbol layout.

1. Introduction

Natural speech is not a viable method of communication for individuals with severe speech and motor impairments (Beukelman & Mirenda, 1992). Current technological advances provide new opportunities for building communication aids that facilitate natural, efficient and personalized interactions. Large individual differences in the nature and degree of communication impairment, however, require that systems be sufficiently general while allowing individuals to adapt key features to accommodate their preferences and abilities (Patel & Roy, 1998).

Most augmentative and alternative communication (AAC) systems employ static layouts to display vocabulary. Users construct messages by selecting a series of icons. These methods are physically demanding and can be on the order of ten times slower than natural communication (Vanderheiden, 1985). In addition, most systems rely on human experts to customize the interface to meet each individual user's needs. This labor-intensive process is replicated numerous times throughout the user's life to accommodate changing communication needs.

We envision a multimodal, adaptive, context-sensitive communication aid which would address many of the shortcomings of current approaches. This paper describes our initial steps towards this goal.

2. A Prototype Adaptive Interface

We are designing an active picture communication display. The focus of our effort is to automate the layout of symbols dynamically depending on the needs of individual users. For example, consider a system that can sense the location of the user (using the global positioning system (GPS)). When the user is in a restaurant, the system should make food related symbols easier to access. Various environmental cues in addition to location may be leveraged to predict and facilitate message construction.

Although our ultimate goal is to build a portable device, we are currently working on a touch-screen interface which enables the creation of messages in a virtual chat room environment. Using the interface, the user may navigate between rooms within the virtual environment and exchange text messages with others who are present. The virtual world enables us to test concepts underlying the context sensitive system in a controlled environment. Contextual cues are derived from room locations and the identities of other people in the rooms.

2.1 Picture Vocabulary

The device is modeled around a picture communication book which is the most commonly used display for preliterate individuals (Beukelman & Mirenda, 1992). Black and white line drawings from the Picture Communication Symbols (Mayer-Johnson, Inc) and Elephant's Memory (Igen-Housz, 2001) have been selected as an initial symbol set. These symbols are highly pictorial and transparent in their meaning.

The initial vocabulary is selected based on a corpus of the most commonly used words by individuals with severe speech and motor impairments. Odgen's Basic English vocabulary was compared to this corpus and items that occurred in both corpora were used as an initial core vocabulary. To test the initial prototype we have selected a restaurant and zoo scenario in a virtual environment. Thus, we added fringe vocabulary items to ensure adequate communication in each scenario.

Fig.1 Interface layout. Top: current selection frame with slots; Left: sentence template frames; Center: category array; Right: lexical array.

2.2 Message Construction

Message construction using conventional picture communication books consists of sequentially selecting the subject, verb and object from an array of symbols. In contrast, in our system the user first selects a sentence frame and then proceeds to fill in the slots with appropriate vocabulary. The display has four major parts (Figure 1). The uppermost horizontal part of the screen is dedicated to visualizing the message being constructed. The remainder of the interface is dedicated to displaying the vocabulary which is further divided into three sections: sentence template array (left column), category array (middle columns) and lexical array (rightmost columns).

Filling in the slots of the sentence template is accomplished through selecting vocabulary from the category items or lexical items. When a category icon is selected, the lexical array is updated accordingly. For example, selecting the "animal" category would update the lexical array to include icons for "dog", "cat", "horse", etc. The lexical array is also updated by the user's context. For example, if she is in a restaurant, food items would be called up into part or all of the lexical array.

Due to display size limitations and in order to facilitate easier access, the system does not simultaneously display the complete vocabulary. Instead, the lexicon is hierarchically organized. Most words will belong to only one category but some will span across several categories. For example fruits may be categorized as food or more specifically as dessert. In order to maximize search efficiency the structure is intended to form a well-balanced tree.

2.3 Semantic Templates

Many sentences can be strongly constrained by the verb or verb-noun combination that they use. For example, the sentence that describes the action of giving will most likely have a giver, a receiver and an object that is being transferred. We have developed a system in which semantic themes are first selected and then completion is achieved by filling appropriate lexical items into a set of semantic slots. All slots need not be filled to create a complete sentence. Optional slots are provided for count nouns, modifiers, etc. Other slots are required in order to communicate the message. For example, a sentence that does not have an item that is transferred when using the 'give' frame is not permissible. Each semantic frame also has a set of default words that can be deselected. For example, the negative voice is a default for the frames in Figure 2. The user must deselect these negative markers to construct a message with a positive voice.

Fig.2 Examples of frames. Starting at the top: " like" frame," have" frame, "drink" frame and a custom frame. The custom frame can be used with any verb, and is more flexible then the verb driven frames. The drink frame has been filled out with sample content.

2.4 Output

Once a message has been constructed using the interface, a set of rules are used to map the filled semantic frame into a text message which is displayed in the virtual chat room. Communication partners can in turn respond. In the future we intend to use the system for face to face communication thus the user would have the option of either using the text output (i.e. for silent message construction) or synthetic speech.

Fig.3 On the left we can see a generic context vocabulary that is provided without context information. On the right we can see vocabulary that is based on the current topic of conversation (small talk)

3. Context Awareness

We anticipate the largest gain in communication efficiency and naturalness to come from the addition of context awareness. In the current chat room interface, the system detects the identity of the room and loads location-specific vocabulary in the lexical array. We are implementing new features which will enable the system to map environment parameters such as location, past conversations (possibly influenced by speaker identity) and time-of-day to vocabulary that is most appropriate for that context. A default set of contexts will be preprogrammed into the system to serve as a way to bootstrap the learning of personalized contexts. Statistical learning methods will be used to optimally map contextual cues to vocabulary choices. We have also begun investigation of the use of real-world signals from GPS, clocks, and the speech of communication partners to integrate into the system.

Fig.4 Location influenced context. On the left we can see context vocabulary influenced by a diner location; and on the right the context vocabulary is influenced by being in a zoo.

4. Conclusion and Future Directions

By leveraging techniques in machine learning, pattern recognition and knowledge representation it may be possible to improve communication efficiency and naturalness for individuals who require an alternative method of communication. Our next step is to test several aspects of this prototype with different user populations. We are conducting usability tests to access the cost/benefit of message construction using semantic frames; the cognitive load of altering vocabulary displayed conditional on the user's context; and the comprehensibility of category and lexical organization. We are also continuing to further develop the system by moving outside the virtual world scenario to model the user's context in real world situations. For example, we are collecting data with global positioning sensors and investigating the vocal control abilities of individuals with severe speech impairments (Patel, 2000). In subsequent versions of the prototype we hope to include voice as an additional input modality and to use signals from the physical environment to predict context.

5. Acknowledgments

This paper is based upon work supported by the National Science Foundation under grant No. 0083032. Any opinions, finding, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

7. References

Beukelman, D. R., & Mirenda, P. (1992). Augmentative and alternative communication: management of severe communication disorders in children and adults. Baltimore: Paul H. Brookes.

Ingen Housz, T. (2001) http://www.khm.de/~timot/PageElephant.html 

Patel, R. (2000). Identifying information-bearing prosodic parameters in severely dysarthric speech. Doctoral dissertation, University of Toronto.

Patel, R., & Roy, D. (1998). Teachable interfaces for individuals with dysarthric speech and severe physical disabilities. Proceedings of the AAAI Workshop on Integrating Artificial Intelligence and Assistive Technology, 40-47.

Vanderheiden, P. J. (1985). Writing aids. In J.G. Webster, A.M. Cook, W.J. Tompkins, & G.C. Vanderheiden (Eds.), Electronic aids for rehabilitation (pp. 262-282). London: Chapman and Hall

Go to previous article 
Go to next article 
Return to 2002 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.