2002 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2002 Table of Contents


LIMITS OF HUMAN WORD PREDICTION PERFORMANCE

Gregory W. Lesher, Ph.D.
Enkidu Research, Inc.
247 Pine Hill Road
Spencerport, NY 14559
Phone: 716-352-0507
Fax: 716-352-0508
Email: lesher@enkidu.net

Bryan J. Moulton
Enkidu Research, Inc.
Email: moulton@enkidu.net

D. Jeffery Higginbotham, Ph.D.
Department of Communicative Disorders and Sciences
University at Buffalo
122 Cary Hall, 3435 Main Street
Buffalo, NY 14214-3005
Phone: 716-829-2797 ext. 635
Email: cdsjeff@acsu.buffalo.edu

Brenna Alsofrom
Kornreich Technology Center
National Center for Disability Services
201 I.U. Willets Road
Albertson, NY 11507-1599
Phone: 516-465-1462
Email: balsofrom@ncds.org

INTRODUCTION

For more than 20 years, word prediction has been an important technique for augmentative communication. Traditional prediction systems have used word frequency lists to complete words already started by the user. In the last few years, however, statistical relations between word sequences have been exploited to improve predictive accuracy. In exploring more advanced statistical prediction techniques originally formulated for the speech recognition field, our research group found it increasingly difficult to increase predictive accuracy as keystroke savings exceed 55% (Lesher, 2001). Before continuing our investigations, we thought it prudent to establish that better word prediction was theoretically possible. Given the long history of interactive, assisted communication in AAC, we naturally turned to the best word predictors we know - humans.

The "predictability" of sequences of letters in English has long been known. In an influential paper on computational linguistics, Shannon (1951) demonstrated that written English has a very high degree of redundancy, quantifying that redundancy by estimating the language's predictability. Using Shannon's numbers, we can calculate that an AAC system taking full advantage of the predictability of English could achieve keystroke savings in excess of 85% with no more keys than are on a conventional Qwerty keyboard. Unfortunately, such a system would be impractical - it would require a constantly changing array of characters, strings, and words that would confound even the most focused user.

Shannon's predictability estimates are of little use in determining the upper bounds on word prediction within a conventional AAC system - the structure of the problem is just too complex for a simple conversion from character predictability to keystroke savings. The most accurate method of establishing performance bounds in an AAC framework is to literally replace the machine prediction engine with a human predictor. People can bring to bear an impressive array of techniques for word prediction, including analysis of syntactical, grammatical, and semantic relations, resolution of underlying context and implicit references, and identification of idiomatic phrases. Machine-based prediction approaches can employ only a small subset of these techniques, albeit in a much more precise manner. We therefore developed an experiment designed to quantify human prediction.

METHODS

We programmed a graphical interface that allowed us to automate the administration of the prediction experiment. Subjects were presented with a few sentences of initial context (in a text window) from one of several testing texts. Each subject was then asked to generate a list of the six words judged most likely, in his or her estimation, to appear next in the text. Once the list had been generated, the subject hit a "Check" button in the interface. If the next word of the testing text did indeed appear in the list, it was added to the text window with the initial context, and the subject was prompted to generate a new prediction list for the next word. If the word was not in the list, however, only the first letter of the word was added to the text window, and the subject was prompted to generate a new list given this word-initial character. If the word was not in this subsequent list, an additional character was added to the text window. This iterative process was repeated until the entire testing text had been generated.

The experimental process is analogous to a human user generating text on a conventional AAC system. Instead of the computer predicting the word list, as in a traditional system, the human subject does the predicting. And instead of the human selecting characters and words to produce a message, the computer figures out which characters and words must be selected to reproduce the pre-defined testing text. The system maintains a record of the number of keystrokes (individual character selections plus word list selections) and computes a keystroke savings for the subject - a quantitative performance measure of the human predictor.

In pilot experiments, we found that humans were good at predicting words when they had adequate word and character context, but were very poor when provided with insufficient context. We therefore provided two of the three human subject groups with assistance, in the form of statistical predictions provided by the computer. Members of the first group received no automated assistance (the "no help" group). Members of the second group were provided with a list of frequency-ordered words that matched the starting letters of the initial context, which they could use to supplement or replace their own choices (the "simple help" group). Members of the third group were provided with a supplemental list of words generated and sorted using more advanced statistical methods (the "advanced help" group). In all cases, subjects were provided with as much time as necessary to complete the prediction task.

In order to provide a broad range of texts to bolster the findings of the analyses, subjects were randomly assigned 4 of 9 different testing paragraphs to transcribe. Text material was selected to be as homogeneous as possible with respect to word length (number of characters per word), sentence length (number of words per sentence), and type-token ratio (number of word tokens / number of word types).

RESULTS

Table 1 summarizes the results of this experiment, along with the keystroke savings that would be provided by completely automated word prediction. Keystroke savings were averaged across both texts and subjects. On average, human subjects with no prediction support performed at least as well as the simple machine prediction process. With comparable prediction support, human predictors exceeded the prediction capabilities of their machine counterparts by at least 5 percent. This result was consistent across all of the individual testing texts except one (in which keystroke savings were comparable). This finding is significant because it indicates that when provided with prediction support similar to that of a machine, the human participants are making strategic prediction decisions differently and more effectively than the statistical machine prediction processes.

Condition Average Keystroke Savings
Human: No Help 49%
Human: Simple Help 54%
Human: Advanced Help 59%
Machine: Simple Prediction 48%
Machine: Advanced Prediction 54%

Table 1: Unassisted and assisted human prediction performance, as compared with machine prediction performance.

It is also interesting to note that the peak average keystroke savings for the best subject in the experiment was 64% (not indicated in table) - a full 10 percentage points better than the best machine prediction. On some individual texts, keystroke savings neared 70% for this subject. With training and practice for the best subjects, we can expect that even more substantial differences could be realized. Remember that since we are merely trying to establish that automated prediction might theoretically be improved, even the performances of outlier subjects are extremely valuable.

SUMMARY

In an attempt to establish that word prediction performance has not reached a fundamental limit, we performed an experiment designed to measure the performance of human word predictors. That the average human subject could outperform statistical techniques by a significant margin, and that the best human subject could outperform the automated methods by 10 percentage points, are unequivocal indicators that there is ample room for improvement in machine word prediction. Such a gain, if realizable by an automated system, might help to counteract the cognitive loads typically associated with word prediction (Koester & Levine, 1996), thereby producing significant increases in communication rate.

The experiment described in this paper is non-constructive - it provides no guidance as to how we might improve automated word prediction, only that a sizeable improvement is theoretically possible. We have therefore designed and executed a follow-on experiment to reveal the particular strategies utilized by the best human predictors. We are currently analyzing the results of this study. While some of these strategies involve high-level cognitive schemes that we could not hope to emulate electronically, we are investigating whether we can incorporate some of the simpler human prediction methods into an automated word prediction system.

REFERENCES

Horstmann Koester, H. & Levine, S.P. (1996). Effect of a word prediction feature on user performance. Augmentative and Alternative Communication, 12, 155-168.

Lesher, G.W. (2001). Advanced Prediction Techniques for Augmentative Communications. Phase II SBIR Final Report, ED-98-CO-0031.

Shannon, C. (1951). Prediction and entropy of printed English. Bell Systems Technical Journal, 30, 50-64.

ACKNOWLEDGEMENTS

The authors wish to acknowledge support from the U.S. Department of Education under contract RW97076002.

Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views or policies of the Department of Education.


Go to previous article 
Go to next article 
Return to 2002 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.