ICARE INTERACTION ASSISTANT: A WEARABLE FACE RECOGNITION DEVICE TO FACILITATE SOCIAL INTERACTION
Day Phone: 480—326—6334
Day Phone: 480—727—7985
In the last decade, digital camera technology has transformed photography, making it possible to capture and process images in real time. In addition, the size and weight of digital cameras has been shrinking drastically, allowing the design of small unobtrusive assistive devices for people who are blind or visually impaired. Some researchers such as those at the Kyoto Institute of Technology. have, created wearable devices to help people who are blind navigate streets . However, the consumers we have consulted have told us that they already have canes and dogs to help them navigate, and have encouraged us to work on other problems. One of the problems that was discussed in our focus groups was day-to-day social encounters. Social protocols dictate that a person greet a friend or acquaintance when meeting them unexpectedly. However, a person who is blind might not be immediately aware of such an encounter, and must need to rely upon the other person to initiate social contact.
after the initial contact is made, and a conversation is started, a person who
is blind does not have access to many non—verbal communication cues (such as
facial expression or eye contact) that sighted people take for granted. It is
with these problems in mind that we have undertaken the development of the
iCARE Interaction Assistant, which is a novel camera—based wearable device,
designed specifically to facilitate the social interactions of users who are
blind, by allowing them to more readily initiate social interactions, and to
allow them to perceive non—verbal cues during subsequent verbal interactions.
iCARE Interaction Assistant hardware includes a tiny analog CCD camera with a
1/3—inch CCD that has a light sensitivity of 0.2 Lux. The camera’s 92-degree
field if view provides good coverage of the space in front of the user. This
camera (which is mounted in the nosepiece of a pair of glasses worn by the
user) is powered by a 9V battery, and generates an NTSC analog video signal.
That signal is routed to an Adaptec(R) video digitizer, which converts the
input signal into a compressed AVI video stream, and transmits that stream over
a USB cable. Because
the digitizer output is compliant with the standard Windows Driver Model (WOW) it 4 appears to an application programmer as a generic video capture device on the Windows(R) operating system.
laptop computer (which can be carried in a backpack) then executes a face
recognition algorithm. We used a tablet PC with an Intel(R) Centrino 1.5 0Hz
processor, and 512 MBytes of RAM. (This particular laptop was chosen because of
its compact form factor.)
video captured by the camera might (or might not) contain any faces. Since the
laptop computer provides a limited amount of processing power, it is important
to identify which regions (if any) of the video frames contain a human face.
Since the video frames must be Scanned in real time, it is important that the
method for doing this is highly optimized. We used an adaptive boost algorithm,
which starts by quickly scanning an entire video frame (to rule out regions
that are unlikely to contain faces) and then iteratively processes the
remaining regions, gradually reducing the candidate regions until a final
decision is made for each region.
a region in a video frame is identified as a face, all of the processing
resources are focused on analyzing it, to recognize the person. In the iCARE
Interactions Assistant we have employed two different face recognition
algorithms to recognize people: (1) a well known method called Principal
Component Analysis (PCA) , and our own novel method called Distinctive
Feature Analysis (DFA), which recognizes each person based on distinctive
facial features that distinguish him/her from others in the database. Both of
these algorithms compare the face image captured by the video camera to face
images in an onboard database, which were captured by that same video camera in
the past. A similarity measure is used
% to compute which of the faces in the database are most similar to the newly captured face image, and the name of the person best matching the newly captured face is chosen.
The name of that person is then communicated to the user, using synthesized speech. The Interaction Assistant device speaks the name of that person into the ear of the user, using the Microsoft Speech Engine. The speech signal is routed to a sound emitter in the earpiece of the glasses (rather than through an in-the-ear device) to avoid altering the environmental audio perception of the user.
Face recognition algorithms have historically had to deal with two persistent problems-neither of which has yet been fully solved. First, the illumination present when the device is trying to recognize a person. (Changes are especially great between indoor and outdoor environments.) This problem can be partially solved by populating the database over time with a diverse collection of face images for each person, taken under various lighting conditions. The second problem is that the person being recognized might not be looking directly at the camera, thus slightly altering the appearance of his/her face image, moment by moment.
While testing the Interaction Assistant, we found that, although a majority of the frames in a video steam might be identified correctly, these two types of variations sometimes caused the device to sporadically recognize people incorrectly. To minimize user confusion, the device is configured to wait until the face recognition algorithm recognizes the same person in five consecutive frames before it speaks the name of that person.
The current Interaction Assistant prototype device recognizes and speaks the names of people that it has previously stored in its database. The name of the person standing in front of the device is normally delivered discretely to the user, but during demonstrations it is played through speakers, to make it audible to the audience. The prototype can be configured to use either or two different face recognition algorithms: Principal Component Analysis (PCA), or Distinctive Feature Analysis (DFA). The DFA algorithm is more reliable, but requires considerable processing time as each new face is added to the database. (Faces captured during a particular day might be added to the database overnight, to allow time for the intensive processing.) The PCA algorithm is typically used during demonstrations, because it permits members of the audience to come forward, have the device capture images of their faces, and then demonstrate that it can recognize them by speaking their name. (This “capture and learn” process takes about 30 seconds per person.)
In conclusion, the current implementation of the iCARE Interaction Assistant is aimed at recognizing faces to facilitate initial encounters, thus allowing a user to initiate social interactions (3). Ongoing work is aimed at facilitating subsequent verbal interactions, by interpreting non-verbal cures, such as eye contact, facial expressions, and gestures. The Interaction Assistant is just one component in the larger iCARE project (4) that is expected to produce very relevant and practical knowledge for the future design of assistive devices that go beyond navigational aids to facilitation of learning, social interaction, and communication, which are all vital to success in today’s professional world.
4. http://cubic.asu.edu (Click on iCARE Projects)