1995 VR Conference Proceedings

Go to previous article 
Go to next article 
Return to 1995 VR Table of Contents 

Using Virtual and Augmented Reality to Control an Assistive Mobile Robot

Kristian T.Simsarian & Lennart E.Fahl

Swedish Institute of Computer Science
S-16428 KISTA
Stockholm, Sweden
fax: +46 (8) 751-7230
tel: +46 (8) 752-1570


This paper describes an interface for controlling a mobile robot through a virtual reality interface. The system combines graphical immersive environments with live video from a robot working in a real environment. The interface can be used to control the robot in two principle ways. One is to control the robot as a vehicle in the virtual and real world while specifying robot movements in real time. The second way is to use the virtual reality model of the real world to specify tasks for the robot to perform, such as pick-and-place and point-to-point navigation. The real and virtual worlds are synchronized and updated based on the operator selections, commands and robot actions. Since the robot can perform actions in the real world, this gives a user the ability to perform remote manipulations via the virtual reality. One application for such an assistive robot is working as an assistant for people with mobility disabilities.


In this paper we describe our current work with an application of distributed virtual worlds that combines robotics in both virtual and real worlds. The main motivation for this system comes from research in fields of virtual and augmented reality, autonomous robotics, and computer vision and the desire to use state-of-art mobile robotics to build an assistive tool. The result is a system that combines real-world autonomous mobile robotics with a simulated virtual world as a high-level control device. The main application for such a system is the realization of mobile robot applications in the broad area of hazardous or inaccessible environment exploration as well as assistance for those with disabilities.

Real world mobile robot high-level task planning is difficult. To aide in this we are researching this potentially powerful tool for human- robot interaction. The use of a virtual environment provides a method for a human to interact with a model of the robot space and specify tasks for the mobile robot.

Our system combines graphical immersive environments with the live video from a real robot. The virtual environment is a coarse model of the world in which the real robot exists. The different spaces are synchronized and updated through a model of the world that exists in the virtual space. The model is detailed enough to specify high-level tasks, such as point-to-point navigation and pick-and-place manipulation while the robot has the basic navigational skill to perform path-planning and obstacle avoidance. Being released from low-level autonomous robot interaction, the operator then has the potential, with this interface, to interact and specify tasks in a high-level manner for a number of robots.

This paper describes the framework we are using for the virtual and real world combination and demonstrates the principle which we are applying to performing remote tasks within a new immersive paradigm.

Application Example

The human operator is given access to an immersive environment which represents a simplified version of remote real environment. Situated in the virtual environment is a mobile robot. The robot can be considered as a vehicle that has the ability to carry the operator through both the virtual and remote-real worlds. The robot interface in the virtual world consists of a real-time camera view of the robot's real world environment accompanied by an interface control panel. The interface control panel consists of buttons, displays, and various data to aid the interaction. Some of the tools available can set robot speed, acceleration parameters, or command various image transforms. There are also displays giving current real world robot and environment state, i.e. battery supply or radioactivity level. The robot has the ability to move through the world and can be controlled at a high level by the operator through the interface. The operator gives simple instructions to navigate, such as "go there" accompanied by a pointing gesture in the virtual world. Or, alternatively, the user might give a more phisticated command, such as "head toward that doorway there" accompanying the command with a pointing gesture. Because the virtual model is at least roughly synchronized to the real world and information about specific doors are contained in the virtual environment model, this command can quickly be translated into navigational commands for the robot base.

Alternatively the operator could move around the virtual environment freely and specify tasks for the robot to perform. As the operator navigates through the virtual environment the operator can specify point-to-point navigational tasks as well as pick-and-drop manipulation tasks. These then turn into batch-like higher-level goals for the robots navigational path-finding and grasping systems.

One important difference that enables this type of system to work is that as the user interacts with this smart machine, the machine is permitted to say "I don't know." The robot does not have to make high- level decisions, instead it performs as well as it can and always has the ability to return to the user with questions. Allowing this relaxation releases the system from many of the hardest problems in AI while simultaneously allowing us to build machines that can perform useful tasks and providing a novel platform for further research in autonomous robotics, sensing, man-machine interaction, and virtual environments.

Such a robotic aid would be extremely useful to a person with physical disabilities. For some people the simple act of going to another room to retrieve a book can be extremely difficult. A robot that performs such a physical task could be invaluable to the physically challenged. Thus this sort of robot/human system would find utility as a handicap aid. We are currently researching how to design the interface and the robot tasks with the needs of the physically handicapped in mind. Likewise the system could be applied for those operating in the hazardous environments as a proxy, thus the ability to operate remotely would greatly increase job safety. The main application described here can be consolidated as allowing an operator to perform remote operations with the robot, something we are calling virtual presence. In the next section we describe the related work on this subject followed by a description of our application, the individual components, interaction issues of real and virtual worlds and the work to date. We then sum up by describing future work with the system.

Related Research

It is easy to see how having the capability to send autonomous robots into hazardous and remote environments (e.g. space, sea, volcanic, mine exploration, nuclear/chemical hazards), would be useful. Robotics can be built to stand higher environmental tolerance than humans, they can be built to perform specialized tasks efficiently and they are expendable. To this end there has been much research on fully autonomous robots that can navigate into an area, perform actions such as taking samples, performing manipulations, and return the information to base controllers without human assistance. On the other end of the mobile robot research, investigators have worked on the interface between man and machine to enable an operator to control a robot remotely in space, and battlefield applications and even used simulated environments for for predictive task planning. However the fields of autonomous and teleoperated robotics exist relatively independent of each other.

We are using an immersive virtual environment as the interaction paradigm with the real world and the robot. Specifically our work is an application in the SICS Distributed Interactive Virtual Reality (DIVE) system. We are incorporating on-board video from the robot into the virtual world. The video can subsequently be enhanced and augmented to communicate information to the operator. This is quite similar to work in Augmented Reality, which at is base is the concept of combining video with graphic overlays. Augmented reality techniques have been demonstrated in a variety of practical applications. Examples of these are displaying graphical information in printer repair operations. Or displaying information from a 3-D model base on a real world artifact. Other applications include virtual tape-measurement on real scenes as well as graphical vehicle guidance, and enhanced displays for teleoperated control. All of these separate applications are relevant for our robot application. Additionally the reverse operation can be performed, the virtual environment can also be augmented in a similar fashion by real-world images.

In addition to the applications in hazardous and remote environments we are also following previous work at SICS in cooperation with the Swedish Handicap Institute that investigated robotics as an aide for persons with physical disabilities. In that application, the robot consisted of a robotic arm that was controlled through the DIVE interface.


The current state of virtual reality research, robotics and vision are at a stage where their combination can produce useful tools. We are taking our work in virtual environment, robotics and computer vision and combining them into a useful platform for a hybrid environment. In this interface a user is physically and virtually located in multiple places at once. The user's physical environment, the virtual environment and the environment where the remote robot is located are combined into a novel interface for performing remote operations. The user enters a virtual world where a robot vehicle is graphically located and depicted. This vehicle could then be started and controlled. The robot, which could be located in an entirely different physical space, can now be steered in the real world. The robot sends back video from the real- world which is then displayed inside the virtual world. Thus, inside the virtual world, the operator has access to a window on the real world. As the operator makes selections and commands, the robot performs these actions and the virtual world is updated to reflect the current real world and robot state. Virtual objects can be superimposed on the images of the real world as well as real images being superimposed onto virtual objects. In addition to having the ability to manipulate virtual objects which exist inside the virtual environment, this interface allows the manipulation of real world objects via the robot arm. Since there are a number of system components handling the various processing tasks, the system is distributed across different platforms. In this section we describe the system along with a number of powerful concepts regarding the interface and interaction with the real world via the virtual world.

System Overview

There are three main distributed computational systems. These are the robot system, the graphical object manipulation and rendering system and the computer vision processing system. The information that is passed around can also be viewed as flowing between the real and virtual worlds via the camera, the robot and the user.

It is important to emphasize that the agents will be navigating and interacting in two different worlds, the real world and the virtual world. The robot physically exists in the real world and the virtual world contains an inexact representation of the robot real world model. Although the robot is endowed with a basic model of the environment from an architectural drawing of the basic physical world structure and artifacts, through movement and exploration, the robot has the ability to augment this model, with new objects, methods and descriptions that are more useful for comparison to its sensor data. The video from the camera flows from the real world to the virtual world. These images represent the real-world from a robot-centered perspective. The user sends commands to the robot via the interface, thus these commands are made by the operator interacting with the virtual world. The commands may be as simple as updating velocity and orientation or more may also be higher-level and more complex involving path specification, navigational targets, and grasping tasks.

The virtual world serves as the communication medium between the robot and the user. It is through the interaction of the robot with the virtual environment and the operator with the virtual environment that interaction between the operator and the robot can take place. Thus bi- directional communication and command specification is achieved via the virtual world. In compliment to operator commands, the robot can make queries of the operator regarding task direction as well as update the environment with objects and model features discovered in the course of exploration. Thus many issues related to user interaction with virtual worlds become important. These issues are briefly described in the next section.

Interface and Interaction

Inside the virtual environment that the DIVE system implements, there is a strong model of spatial interaction. This model provides a method of interaction for the operator, the robot, and the objects within the virtual and real worlds. In this section this spatial interaction model and the methods it suggests are described.


Here we summarize key concepts which constitute the DIVE spatial model of interaction, the details for this model can be found in. The goal of the spatial model is to provide a small but powerful set of mechanisms for supporting the negotiation of interaction across shared virtual space. The spatial model, as its name suggests, uses the properties of space as the basis for mediating interaction. Here we briefly introduce the key abstractions of space, objects, aura, awareness, focus, nimbus, and boundaries which define part of the spatial model.

The most fundamental concept in the model is space itself. Virtual space is allowed to have any number of dimensions where each dimension has a notion of 'spatial metrics.' Space is inhabited by objects which might represent people, agents, information or other computer or real- world artifacts. Any interaction between objects occurs through a medium. A medium might represent a typical communication medium (e.g. audio, vision, data socket or text) or perhaps some other kind of object specific interface.

The first problem in any large-scale environment is determining which objects are capable of interacting with which others at a given time. Aura is defined to be a sub-space which effectively bounds the presence of an object within a given medium and which acts as an enabler of potential interaction. Objects carry their auras with them when they move through space and when two auras collide, interaction between the objects in the medium becomes a possibility. It is the surrounding environment that monitors for aura collisions between objects. When such collisions occur, the environment takes the necessary steps to put the objects in contact with one another.

Once aura has been used to determine the potential for object interactions, the objects themselves are subsequently responsible for controlling these interactions. This is achieved on the basis of quantifiable levels of awareness between them. Awareness between objects in a given medium is manipulated via focus and nimbus, further subspaces within which an object chooses to direct either its presence or its attention. More specifically, if you are an object in space the following examples help define the concept:

This notion of spatial focus as a way of directing attention and hence filtering information is intuitively familiar from our everyday experience (e.g. the concept of a visual focus). The notion of nimbus requires a little more explanation. In general terms, a nimbus is a sub- space in which an object makes some aspect of itself available to others. This could be its presence, identity, activity or some combination of these. Nimbus allows objects to try to influence others (i.e. to be heard or seen). Nimbus is the necessary converse of focus required to achieve a power balance in interaction.

Awareness levels are calculated from a combination of nimbus and focus. Aura, focus and nimbus may most often be implicitly manipulated through fundamental spatial actions such as movement and orientation. Additionally, aura, focus and nimbus may be manipulated through boundaries in space. Boundaries divide space into different areas and regions and provide mechanisms for marking territory, controlling movement and for influencing the interactional properties of space. More specifically, boundaries can be thought of as having four kinds of effects: effects on aura, effects on focus, effects on nimbus and effects on traversal.


We use this spatial model to create an interactive and informational rich immersive environment that stores the methods to aid the robots interaction in the real world. The concepts of aura, nimbus, focus are key to the way the robot interacts with the virtual and real worlds. Using the concepts of spatial boundaries and auras we can define interaction mechanisms and methods for sharing information between the robot and the environment.

For example using the concept of object aura we can define a means of transferring information for navigation and object identification. If the robot's aura collides with an object's aura that object may then open up a channel, i.e. the robot focuses and the object projects nimbus, thus enabling the object to pass information to the robot that would be pertinent to the mutual interaction. In this way each object stores information and methods about itself. This information can include:

These last three types of information deserve special mention. An object may store the actual methods in which to perform a local interaction such as recognition. Given that the position of the object and the position of the robot are well known these methods can be rather specific.

Likewise, using the boundaries in space, various locations in the environment may store information and methods regarding navigation. For example there may be certain areas of the environment where great care must be taken, so crossing a boundary could then act like entering a "speed control zone" and thus negotiate control for the robot's velocity. Similarly there could also be areas in the environment where certain configurations or specific paths should be avoided or taken. Crossing a boundary into such an area would open up a channel to transfer specific navigational commands to the robot.

Using this model of interaction unweights the robot control process from the need to have knowledge about the entire environment at all times. Using this spatial model we are distributing the processes and specific information throughout the environment. Also using the model in this way it makes it much less necessary for a robot to have much knowledge about a new environment before actually entering it. Thus when the robot crosses the boundary into a new environment it or the user would be given all the necessary global information regarding that world.

Summary and Future Work

We have presented here an overview of an application in immersive environments using techniques in virtual and augmented realities, robotics and computer vision. This document is a report on work in progress and is meant to convey ideas and issues important for this application. We currently have a remotely operated vehicle that can move in the physical world with an on-board video camera sending real- time images back to the host graphics processor. The vehicle can be tracked and steered in both the virtual and real worlds and the operator has access to the simulated remote vehicle and the real camera video. We are just beginning work with our B21 robot and are working at transferring our previous work in navigational algorithms to this platform.

In addition to the visual cues and control methods available in DIVE we are also planning to use the results of a recent effort in voice recognition within the interface. Thus with voice commands in the virtual environment we expect to free some manual channels for interaction. The Dive system platform already has the capability for limited voice command interaction. This work is in progress and soon should enable the operator to actually tell the robot what to do.

Go to previous article 
Go to next article 
Return to 1995 VR Table of Contents 
Return to Table of Proceedings 

Reprinted with author(s) permission. Author(s) retain copyright.