1999 Conference Proceedings

Go to previous article 
Go to next article 
Return to 1999 Conference Table of Contents


INTELLIGENT PERSONAL ASSISTANTS WITH ADVANCED SPEECH AND GESTURE RECOGNITION - HELPING USERS CONTROL APPLICATIONS, NAVIGATE THE WEB, AND GET INFORMATION

Tim Musgrove
Natural Language Technologies
tmusgrove@mindmaker.com 

Peter Ridge
Architecture & Design
pridge@mindmaker.com 

Jeff Savage
Intelligent Personal Assistants
jsavage@mindmaker.com 

Mindmaker, Inc.
224 Airport Pkwy. #550
San Jose, CA 95110
(408) 467-9200
http://www.mindmaker.com

Overview

This intelligent personal assistant combines advanced artificial intelligence, speech and image processing technology, and multimedia content in a Web-enabled, animated character for PC users, helping them work with applications, navigate the web, manage their schedule, get information, connect with friends and colleagues over the Internet, and play games.

Intelligent Assistant 2.0

Mindmaker's Intelligent Assistant 2.0 is an advanced personal assistant for both business and home use. The Assistant talks and interacts with users in the form of an animated character on the desktop, for example, a parrot named "Prody." Users can ask Prody to navigate PC programs, look up stocks, notify them when e-mail has arrived, browse the Web by voice, play games and call friends and associates over the Internet using his built-in internet telephone. Prody Parrot can act autonomously on the user's behalf, taking messages, reminding the user about meetings, warning the user of unexpected events on the stock market, and reporting news events or weather information. Users do not have to know how to use complicated applications in order to do these things - they merely give a command such as "Make a call" and then answer Prody's common sense questions, like "Whom do you want to call?"

Assistant 2.0 is also a new medium through which webmasters may present their content, with special features that support e-branding and on-line customer service through the Web. Each web site can have its own custom, branded character, based on Mindmaker's Assistant 2.0 framework and components, to act as a virtual tour guide of their website.

Assistant 2.0 also provides many invaluable assistant services such as reminding the user of important dates, alerting the user when a stock in his or her portfolio drops in price suddenly, notifying the user that a very important e-mail has just arrived, or retrieving news, weather, and sports information on demand. He can also connect the user to friends and colleagues using his built-in Internet telephone, and if the intended callee is not available, will gladly take a voice message.

In addition, Assistant 2.0 is an educational and entertaining companion who will play games, give trivia quizzes, and participate in prompted conversations with the user about numerous popular topics such as computers, movies, art, music, science, geography and history.

To accommodate a variety of usage scenarios, we found that multiple input methods were needed. Recognizing that speech is not always the most convenient method of input (for example, when sitting in a meeting), we looked for other ways to relieve users from the tedium of memorizing keystroke commands or menu-hierarchies. One of the most unique methods that we devised was in our utilization of proprietary machine learning technology for mouse gesture recognition - this enables users to teach the assistant what kind of gesture should represent a command, for example, drawing an "X" to represent the "Close" command. Other input methods include dragging and dropping icons, or clicking "hot spots" on the screen. The assistant can even "eavesdrop" on the words that a user speaks or types in another application, and then respond to special words or phrases. All in all, there are seven different input methods. The importance of this variety is that for any situation, there is a way for the assistant to receive any command from the user without requiring one particular input device or method - and it even allows the assistant to respond to unintentional input, and be proactive in offering suggestions when the user is in trouble.

Users have embraced the innovation wholeheartedly. One professional reviewer said "I browsed the web and checked my mail for three and a half hours without once touching the keyboard." Another user wrote "I love it, it is very helpful and fun. I use it 99% of the time I am on my PC." Hundreds of thousands of copies of the Assistant have been shipped with Creative Labs SB Live! audio cards, creating a very large installed base, and user reaction has been markedly positive. Prody Parrot With User
Prody Parrot With User

Architecture & Design

In integrating so many technology components together, we were faced with learning (and in some cases inventing) multiple programming interfaces for the various components, and making several different technologies work together. In addition we had to support two different web browsers (Internet Explorer and Netscape), two e-mail protocols (POP3 and MAPI), create a new script language for agent behavior (so that webmasters could customize the assistant's behavior when the user visits their web site), and make an extension to HTML for embedding these behaviors in hidden codes within a web page. Also, multiple input/output modes were needed in order to be a truly general-purpose application for the end-user. We believe that our Assistant 2.0 combines more technologies in one unified application that nearly any other PC software application in existence.

In fact, so many technologies are combined that our biggest challenge was not to let the product get too complex, or become too difficult to use or to configure. These problems were addressed by (1) determining through extensive user testing what the best default settings and options were, so that most users would not have to change the configurations; (2) making a simplified Assistant Control Panel with easy one-button access to the most often-changed settings, while hiding seldom-changed "power-user" settings in other menus; and (3) adopting a consistent grammar, logic, and style in how the Assistant communicates with the user throughout all his various functions and behaviors, despite that these various functions were developed by several different groups within an overall team of nearly eighty engineers.

Current Status

The Assistant has been fully accepted by Creative Labs, whose install base of many millions demands thorough usability testing, and requires that the average consumer be able to use software effectively without needing a lot of support. In addition to this, some underlying technology components were proven to be solid by the most demanding customers in their respective fields, such as Yahoo!, the number one search service on the Internet, which tested and implemented our search enhancement technology, and Prime Recognition, winner of many benchmarks as the most accurate English-language OCR (optical character recognition) system, that is used by the U.S. Patent Office and other organizations for very-large-scale document imaging systems. Our work with such customers as these convinced us that our agent "building blocks" were robust and stable enough to be launched into a large installed base.

In our future plans, we will work more closely with other companies to help them accomplish e-branding and superior customer service using their own custom personal assistants, which they can give or sell to their customers. We will add many additional downloadable skill sets (and enhance existing ones) such as increased facility for gathering information from the Internet, and support for free-form dictation of text. We will also make more assistants for the existing ones to collaborate with - some of these will be specialized and will be "called in" when needed by the user's general-purpose assistant.

Conclusion

Assistant 2.0 has been proven as an effective personal "companion" who helps PC users work with every application on their PC and every web site that they visit in their browser. He lets novice users spare themselves from remembering complex nested menus or keystroke combinations, by accepting natural, continuous voice commands or quick, easy mouse gestures. He supplies information about weather, sports, news and stocks, and can provide educational and entertaining diversions in the form of conversational games and quizzes.

In the future, more skills and knowledge will be added to Assistant 2.0, in order to provide increasingly human-like behhavior. Many of these add-ons will be downloadable, and most will be self-updating via the Web. The Assistant 2.0 will therefore evolve into a rich platform for the development of assistant-centric, Web-enabled applications.


Go to previous article 
Go to next article 
Return to 1999 Conference Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.