Go to previous article
Go to next article
Return to 2005 Table of Contents
Presenter(s)
Emdad Khan, PhD
CEO, InternetSpeech Inc., San Jose, California, USA
Email: emdad@internetspeech.com
The Internet has proven itself to be a valuable communications medium and the most inclusive source of information and commerce the world has ever known. As valuable as we find the Internet, it is amazing to see how spotty the access to this valuable resource is. Only those who can afford the PC and Internet connection are able to take advantage of it. By most counts, the number of users is very small compared to the numbers that would use it if they could. The good news is that there exists a technology to remedy the situation and adoption of the technology is growing.
The Telephone as the Solution
Many more people in today's world have telephones than computers. The total number of phones in 2003 exceeded 2.1Billion. This growth far outstrips the growth of PC use. Today, there are only about 400 hundred million PC users. Since the primary method of accessing the Internet continues to be the computer, the Internet is limited to only a small fraction of the world's population; the majority are left out. Moreover, not all computers are connected to the Internet for various reasons including cost of accessing Internet, not knowing how to use it or not capable of using it because of disabilities. This gap between those who can effectively use new information from the Internet, and those who cannot is known as "the Digital Divide". Bridging this Digital Divide is the key to ensure that most people in the world have Internet access capability. Since many more people have phones than computers, and the number of phone users is growing very fast, accessing the Internet by using the telephone - the "phone enabled Internet" - is the key to solve the Digital Divide problem.
Incidentally, bridging the Digital Divide will not be complete if the "Language Divide" is not bridged as well. Today, about 80% of the Internet content is in English. Thus people from countries like Japan and China, where English is not commonly spoken, are left out from the majority of Internet content. This divide is called the "Language Divide". Fortunately, this problem can be addressed through existing technologies as well.
Phone enabled Internet has two basic forms: visual or audio/voice. Visual approach allows Web access using text on small screens of compatible wireless phones, PDAs, and hand-held computers. Thus, this approach needs a special device and rewriting of the web site, e.g., using WML (wireless mark up language). Small screen viewing, limited bandwidth, difficulty of entering texts and the need for a special phone are the major limitations of such approach. Besides, it cannot be used in an eyes busy, hands busy situation and by blind and low vision people.
Simply using one's voice and any standard wire line or wireless phone solves the limitations of the small screen wireless devices. Additionally, voice is a more natural user interface and it frees ones hands and eyes, making it truly mobile. But voice-based approach might need rewriting, e.g., until now, proponents of the voice Web conceded that this voice-based approach requires rewriting, e.g., voice enabling web sites using special voice based markup languages, primarily VoiceXML or SALT. Since there are more than a billion web sites, rewriting these web sites using VoiceXML (or SALT or any other language) would be extremely costly, time consuming and hence prohibitive; so claiming voice access to the entire Internet would not be possible.
Providing True Voice Access to the World Wide Web - One Path Leads to the Solution Bridging the Digital Divide requires a solution that provides access to the whole Internet. Voice (Audio) Internet is such a solution that provides access to the whole Internet by using voice and any telephone. One can surf any website, search any word(s), send /receive email and conduct e-commerce. In addition, voice portal features, such as news, weather, horoscopes and directions can be accessed. Clearly, automation is the key to create voice Internet without rewriting the web sites. Automation is achieved by utilizing an Intelligent Software Agent (IA), that can dynamically translate (render) existing web pages that are written in HTML, XML or WML languages. IA renders visual web information into meaningful audio. Rendering is achieved by using Page Highlights (using a method to find and speak the key contents on a page), finding right as well as only relevant contents on a linked page, assembling right contents from a linked page, and providing easy navigation. These key steps are done using the information available in the visual web page itself and proper algorithms to use all such information including text contents, color, font size, links, paragraph, and amount of texts. Artificial Intelligence techniques are used in this automated rendering process. This is similar to how human brain renders from visual page by selecting the information of interest and then reading it. Fig. 1 shows the IA, available features and how the user interacts with the IA and the Internet
It is very important that the user can very easily interact with the IA and access to the desired web contents. The IA ensures this by properly manipulating the information extracted from the web page. Key features for seamless navigation include Page Highlights, Traversing Links, Previous Page, Next Paragraph, and Skip Paragraph etc.
As mentioned before, to really bridge the Digital Divide, Language Translation is essential. Here, again, Automation plays an important role. The IA includes language translation engine that dynamically translates web contents from one language into another in real time. Thus, a Chinese speaking person can ask to surf an English website in Chinese - the Intelligent Agent would access the English website, extract the content of the website and translate it on the fly in Chinese and read it back to the user in Chinese.
How Well Does It Work?
To answer how well Voice Internet can provide meaningful contents from today's Internet, we need to answer the following questions:
(1) can the contents really be provided from any web site on the Internet ?
(2) can the existing Internet contents be rendered in a manner that the rendered content can be obtained in real time, is short, precise, easy to navigate, meaningful in audio and pleasant to listen?
The answers to both questions are "yes". Depending on the site, the "yes" can be a very strong "yes" or a strong "yes" a weak "yes". A content rich page with a small number of links makes rendering and navigation easy since there are only a few choices, and one can quickly select a particular topic or section. If the site is rich in content, links and images/graphics, the problem is more difficult but good solution still exists by carefully selecting a built-in feature called "Page Highlights". The most difficult case is when a page is very rich in images/graphics and links. In such cases, the main information is located several levels down from the home page and so navigation becomes more difficult as one has to go through multiple levels. Using multi-level Page Highlights and customized Highlights, the content can still be rendered well. But in this case, it is not as easy to navigate as the other two cases. Usually most of the Internet contents fall under the first and second categories.
The Solution in Action
Here's how InternetSpeech has given a voice to the Internet. Voice Internet technology, netECHO(r), uses an Intelligent Agent (IA) as described above that transforms an ordinary telephone into a high-tech tool for accessing the Internet. A caller to the service is greeted by the IA. The IA then provides a menu of items to chose contents from the Internet. The user can surf any website, search for sites or information using search word(s), send and receive email and conduct e-commerce. In addition, common voice portal features, such as news, weather, horoscopes and Bible can be quickly accessed from a menu of items.
Users can give simple commands, such as "go to Yahoo" or "read my email" to get to the net-based information they want. Users can quickly locate information, such as late-breaking news, traffic reports, or anything else they're interested in on the World Wide Web. The IA evaluates the site and determines which information is most useful and meaningful ("rendering"), then presents the content in easy-to-follow chunks using the "Page Highlights" feature. The system takes the caller to the selected content on a linked page with easy navigation by simply saying which link he/she wants after being given a short list of choices.
References
[Khan] E. Khan, "Internet Access to Anyone Anytime Anywhere using natural Voice
Over Any Phone", Proceeding of AVIOS, May 2000.
[Williams] M. Williams, "Telephone Speech Recognition and Web Access", Proceeding
of AVIOS, May 1999.
[Khan ] E. Khan, "System and Method for Audio-Only Internet Browsing using a
Standard Telephone", U.S. Patent Number 6,606,611, Aug 12, 2003
For More information on netECHO(r), visit www.internetspeech.com , call 877-312-4638 or send email to corporate@internetspeech.com .
Go to previous article
Go to next article
Return to 2005 Table of Contents