2000 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2000 Table of Contents


VOICE WINDOWS AND VOICE MEADOW; MULTILINGUAL TEXT-TO-SPEECH SYSTEM FOR WINDOWS® 98

Takayuki Watanabe
Department of Information Science, Shonan Institute of Technology
1-1-25 Tsujido-NishiKaigan, Fujisawa, Kanagawa 251-8511, JAPAN
takayuki@la.shonan-it.ac.jp

Tuneyoshi Kamae
Department of Physics, School of Science, University of Tokyo
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, JAPAN
kamae@phys.s.u-tokyo.ac.jp

Tomio Koide
Create System Development Co. Ltd.

Hirohiko Honda
Institute of Space and Astronautical Sciences

Shinichiro Uno
Faculty of Social and Information Sciences, Nihon Fukushi University

Tooru Kurihara
Department of Information Sciences, Tsukuba College of Technology

Sawako Tajima
Internet Technology Research Committee

We are developing a multilingual text-to-speech system for Windows 98, Voice Windows and Voice Meadow, that enables Japanese visually disabled university students and professionals to use personal computers at higher levels. We find many Windows' softwares in market that assist the visually disabled in word-processing, web browsing and screen navigating. In universities, however, students and teachers are required to advance to higher levels of computer use, which are not supported by the existing text-to-speech systems. In the United States, Dr. Raman developed Emacspeak, which opened a path for the visually disabled Unix users to develop expertise in computer science, write application programs, and surf Internet. The present work has been motivated by his work. Voice Meadow is a voicified Meadow, multilingual Emacs running in Windows environment, and Voice Windows offers basic voicified components easily used by scripting languages such as Visual Basic Script. The characteristics of the current system are 1) it accommodates multiple tasks, 2) it uses Microsoft Speech API, 3) it can treat Japanese and English simultaneously, and 4) it is distributed as open-source. Development is still in progress but essential functions have been implemented.

Introduction

When DOS is used as an operating system, operations were processed by character-based interfaces. Voicification with a speech synthesizer and presentation on a braille display were relatively straightforward: In fact, visually disabled programmers developed sophisticated applications for their own use. In Japan, Masao Saito who lost sight at pre-school age wrote a widely used Japanese screen reader, VDM100 (ref. Saito1995).

Usefulness of personal computers was widely recognized by the visually disabled through the use of screen readers such as VDM100. Students learned programming on DOS and visually disabled programmers wrote applications in Japan and elsewhere. This situation has changed when Microsoft Windows became dominant for personal computers. The existing screen readers, which reads the displayed characters, are not efficient with GUI (Graphical User Interface) that uses windows, icons, menus, and a pointer. Sighted users benefited from GUI and could use high-performance Windows computers; visually disabled users, however, could not benefit from GUI. They were left with little choice but to live in DOS world. In Japan, voicification of Windows is carried out by developing new screen-readers, 95Reader and VDM100W-PC-Talker, which are not so useful as VDM100 in DOS environment.

In 1994, Dr. Raman built up a new system, ASTER (Audio System for TEchnical Readings) (ref. Raman1994). ASTER is a system that formats the content of an application into styles suited for audio presentation with various audio-formatting styles. Raman extended this approach to more general computing tasks and released a new monumental speech interface, Emacspeak (ref. Emacspeak and Raman1997), as an Emacs subsystem. With use of Emacspeak, visually disabled students, teachers, and programmers who use Unix computers can develop expertise in computer science.

The goal of Voice Meadow is to extend the success of Emacspeak to multilingual Windows world. Voice Windows involves Voice Meadow and is not restricted in Meadow. To achieve this goal, our text-to-speech (TTS) system must treat multilingual languages.

Voice Windows

Our system consists of one TTS server and many TTS clients. Meadow is one of such TTS clients. Example of another clients are general applications such as Internet Explorer and Word, a keyboard reader, and a client that monitors translations from ASCII characters into Kanji characters. Our TTS server can treat simultaneous inputs from multiple clients. The current system can have multiple TTS servers in case voicification processes are independent.

Voicification with multitasking OS

TTS systems running with multitasking OS such as Windows should mix concurrent auditory outputs from multiple clients. In Windows environment, Japanese input is handled by Japanese input method editor, a front-end program that converts typed ASCII characters into appropriate Japanese characters such as Hiragana and Kanji. This conversion, or translation, is carried out independently of the application that receives the Japanese input. Thus, the system must accept concurrent inputs from input method editor and the application.

The current system uses two mechanism to mix concurrent auditory outputs. One is DirectSound and the other is audio drivers written in Windows Driver Model (WDM). With use of WDM multiple TTS systems or screen readers can work simultaneously.

Windows Script Host based platform

Windows Scripting Host (WSH) is a language-independent scripting host for 32-bit Windows platforms. WSH is a universal scripting host for scripting languages such as Visual Basic Script and Java Script. Through ActiveX interfaces, it can manipulate many applications and objects such as Word, Excel, Internet Explorer, a file system, and network.

Voice Windows provides basic components, which have dedicated GUI for low-sighted users as well as auditory UI for the visually disabled, that can be used by WSH. These components consists of three basic input/output functions of Visual Basic Script and one input/output object of WSH. Voice Windows will also support 6 common dialog boxes of Visual Basic. Using these components with WSH, visually disabled users can manipulate Windows and can write applications.

Voice Meadow; Emacs based platform

Voice Meadow has two aims. The first aim is to offer a multilingual speech system that voicifies Meadow, a multilingual Emacs working with Windows. The second one is to offer a speech server that satisfies requests from Emacspeak and to run Emacspeak in Windows environment. Voice Meadow currently struggles for the first one.

Voicification process

Emacs Lisp functions retrieve information from Meadow with use of `hook' and `advice' functions. Some TTS commands are added to this information. For example, in case of Japanese text, a command that orders a TTS server to use a Japanese TTS engine is added.

The TTS server uses Microsoft Speech API for voicification. It uses English TTS engines when speaking English and Japanese engines when speaking Japanese. The communication between the TTS server, which is a Window program, and Meadow, a non standard Window program, is mediated by a TTS client, a simple console program. The TTS client is started as a child process of Meadow. Emacs Lisp functions in Meadow sends strings of information and commands to standard input of the TTS client. The TTS client send its standard input to the TTS server through a mailslot. The TTS server has a loop that receives messages from its mailslot. It voicifies the messages according to the commands.

Important features to handle Japanese

  1. Voice Meadow must identify the language of its content. It uses the language identification of Meadow.
  2. Voice Meadow must identify an encoding scheme of Kanji. Otherwise, harmful sounds are outputted. It uses the identification of Meadow and translates the information into Shift_JIS, the most general encoding scheme in Japanese Windows.
  3. Voice Meadow must switch English TTS engines and Japanese TTS engines. Switching can be carried out word-by-word, line-by-line, or paragraph-by-paragraph. Or one region or whole buffer can be voicified in single language. Generally, line-by-line switching is natural for bilingual texts. In some cases it is natural to pronounce English not exactly but with Japanese-English pronunciation.
  4. Voice Meadow must control all engines used to voicifiy the information as one object. For example, if a sentence is vocified with Japanese and English engines, commands such as pause, resume, reset, or changing speed are issued for all these engines simultaneously.
  5. Concluding Remarks

    A new multilingual system, Voice Windows and Voice Meadow is a text-to-speech sytsem for visualy disabled users who are required to advance to higher levels of computer use, which are not supported by the existing systems. Voice Meadow voicifies multilingual information of Meadow in Japanese and English smoothly switching languages. Voice Windows, which provides some components that can be used with Windows Scripting Host, offers basic platform for visually disabled users and low-sighted users to manipulate Windows and to write applications. Develoment is still in progress but essential parts have been implemented.

    References

    (Saito1995)
    M. Saito: "Software Production for the Visually Impaired", Journal of Information Processing Society of Japan, Vol. 36, No.12, pp. 1116-1121 (1995) (in Japanese).
    (Raman1994)
    T.V. Raman: "Audio System for Technical Readings", Ph D thesis, Cornell University (1994).
    (Emacspeak)
    Emacspeak package is found at http://www.cs.cornell.edu/Info/People/raman/emacspeak/emacspeak.html. (Raman1997)>/DD>
    T.V. Raman: "Auditory User Interfaces -Toward the Speaking Computer-", Kluwer Academic Publishers (1997).

    Go to previous article 
    Go to next article 
    Return to 2000 Table of Contents 
    Return to Table of Proceedings


    Reprinted with author(s) permission. Author(s) retain copyright.