ICARE INTERACTION ASSISTANT:
A WEARABLE FACE RECOGNITION DEVICE TO FACILITATE SOCIAL INTERACTION
Day Phone: 480—326—6334
Fax: 480—965-1885
Email: sreekar.krishna@asu.edu
Presenter
#2
Day Phone: 480—727—7985
Fax: 480—965—1885
Email: john.biack@asu.edu
In the last decade, digital camera technology has transformed photography,
making it possible to capture and process images in real time. In addition, the
size and weight of digital cameras has been shrinking drastically, allowing the
design of small unobtrusive assistive devices for people who are blind or
visually impaired. Some researchers such as those at the Kyoto Institute of
Technology. have, created wearable devices to help people who are blind
navigate streets [1]. However, the consumers we have consulted have told us that
they already have canes and dogs to help them navigate, and have encouraged us
to work on other problems. One of the problems that was discussed in our focus
groups was day-to-day social encounters. Social protocols dictate that a person
greet a friend or acquaintance when meeting them unexpectedly. However, a
person who is blind might not be immediately aware of such an encounter, and
must need to rely upon the other person to initiate social contact.
Even
after the initial contact is made, and a conversation is started, a person who
is blind does not have access to many non—verbal communication cues (such as
facial expression or eye contact) that sighted people take for granted. It is
with these problems in mind that we have undertaken the development of the
iCARE Interaction Assistant, which is a novel camera—based wearable device,
designed specifically to facilitate the social interactions of users who are
blind, by allowing them to more readily initiate social interactions, and to
allow them to perceive non—verbal cues during subsequent verbal interactions.
The
iCARE Interaction Assistant hardware includes a tiny analog CCD camera with a
1/3—inch CCD that has a light sensitivity of 0.2 Lux. The camera’s 92-degree
field if view provides good coverage of the space in front of the user. This
camera (which is mounted in the nosepiece of a pair of glasses worn by the
user) is powered by a 9V battery, and generates an NTSC analog video signal.
That signal is routed to an Adaptec(R) video digitizer, which converts the
input signal into a compressed AVI video stream, and transmits that stream over
a USB cable. Because
the digitizer output is compliant with the standard Windows Driver Model (WOW)
it 4 appears to an application programmer as a generic video capture device on
the Windows(R) operating system.
A
laptop computer (which can be carried in a backpack) then executes a face
recognition algorithm. We used a tablet PC with an Intel(R) Centrino 1.5 0Hz
processor, and 512 MBytes of RAM. (This particular laptop was chosen because of
its compact form factor.)
The
video captured by the camera might (or might not) contain any faces. Since the
laptop computer provides a limited amount of processing power, it is important
to identify which regions (if any) of the video frames contain a human face.
Since the video frames must be Scanned in real time, it is important that the
method for doing this is highly optimized. We used an adaptive boost algorithm,
which starts by quickly scanning an entire video frame (to rule out regions
that are unlikely to contain faces) and then iteratively processes the
remaining regions, gradually reducing the candidate regions until a final
decision is made for each region.
Once
a region in a video frame is identified as a face, all of the processing
resources are focused on analyzing it, to recognize the person. In the iCARE
Interactions Assistant we have employed two different face recognition
algorithms to recognize people: (1) a well known method called Principal
Component Analysis (PCA) [2], and our own novel method called Distinctive
Feature Analysis (DFA), which recognizes each person based on distinctive
facial features that distinguish him/her from others in the database. Both of
these algorithms compare the face image captured by the video camera to face
images in an onboard database, which were captured by that same video camera in
the past. A similarity measure is used
% to compute which of the faces in the database are most similar to the newly
captured face image, and the name of the person best matching the newly
captured face is chosen.
The
name of that person is then communicated to the user, using synthesized speech.
The Interaction Assistant device speaks the name of that person into the ear of
the user, using the Microsoft Speech Engine. The speech signal is routed to a
sound emitter in the earpiece of the glasses (rather than through an in-the-ear
device) to avoid altering the environmental audio perception of the user.
Face
recognition algorithms have historically had to deal with two persistent
problems-neither of which has yet been fully solved. First, the illumination
present when the device is trying to recognize a person. (Changes are
especially great between indoor and outdoor environments.) This problem can be
partially solved by populating the database over time with a diverse collection
of face images for each person, taken under various lighting conditions. The
second problem is that the person being recognized might not be looking
directly at the camera, thus slightly altering the appearance of his/her face
image, moment by moment.
While
testing the Interaction Assistant, we found that, although a majority of the
frames in a video steam might be identified correctly, these two types of variations
sometimes caused the device to sporadically recognize people incorrectly. To
minimize user confusion, the device is configured to wait until the face
recognition algorithm recognizes the same person in five consecutive frames
before it speaks the name of that person.
The
current Interaction Assistant prototype device recognizes and speaks the names
of people that it has previously stored in its database. The name of the person
standing in front of the device is normally delivered discretely to the user,
but during demonstrations it is played through speakers, to make it audible to
the audience. The prototype can be configured to use either or two different
face recognition algorithms: Principal Component Analysis (PCA), or Distinctive
Feature Analysis (DFA). The DFA algorithm is more reliable, but requires
considerable processing time as each new face is added to the database. (Faces
captured during a particular day might be added to the database overnight, to
allow time for the intensive processing.) The PCA algorithm is typically used
during demonstrations, because it permits members of the audience to come
forward, have the device capture images of their faces, and then demonstrate
that it can recognize them by speaking their name. (This “capture and learn”
process takes about 30 seconds per person.)
In
conclusion, the current implementation of the iCARE Interaction Assistant is
aimed at recognizing faces to facilitate initial encounters, thus allowing a
user to initiate social interactions (3). Ongoing work is aimed at facilitating
subsequent verbal interactions, by interpreting non-verbal cures, such as eye
contact, facial expressions, and gestures. The Interaction Assistant is just
one component in the larger iCARE project (4) that is expected to produce very
relevant and practical knowledge for the future design of assistive devices
that go beyond navigational aids to facilitation of learning, social
interaction, and communication, which are all vital to success in today’s
professional world.
References
4. http://cubic.asu.edu
(Click on iCARE Projects)
Go to previous article
Go to next article
Return to 2006 Table of Contents