2000 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2000 Table of Contents

A Strategy and Information on Desigining User-Friendly Automated Telephone Services Which Incorporate Text-To-Speech

Tim Noonan
SoftSpeak Computer Services &
Royal Blind Society of New South Wales

Automated telephone information systems have particularly important benefits for a range of people with disabilities. For people who are blind or vision impaired, or for those who cannot easily handle or manipulate books or paper, such services are able to present information in audio, which otherwise might only be available in a paper form. For people who have difficulties traveling from their home or place of work, such services can provide information and allow the conducting of transactions which otherwise might involve travel to a bank or Government department. For people who are unable to read the native language of a country, such services may offer information in other languages or may more clearly provide information in the native language which would be harder for those individuals to read in printed form. As more and more information becomes available online, many groups of people, including the older population, may be more comfortable using the humble telephone to access that information, than they would be using a computer with all its software and required learning. People who are economically disadvantaged could use the telephone to access information, as they may not be able to justify the cost of purchasing a computer and online access.

Of course one group of people with disabilities who do not generally benefit from such systems are people who are deaf or hard of hearing, unless the IVR service incorporates an equivalent TTY service.

This presentation is in three parts:

It begins by exploring the important area of sound user interface design for automated telephone information services, which are often termed Interactive Voice Response (IVR) systems. Some examples of good and bad user interface design principles will be provided, and standards and references will be discussed.

Next, the differences between, and the relative advantages of, human-recording-based and synthetic speech-based IVR systems will briefly be discussed, with emphasis on the advantages and corresponding increased challenges offered by synthetic speech telephone systems. Although the proportion of systems incorporating synthetic speech to date is very low, this is expected to change dramatically in the next one to two years.

The final section of the presentation demonstrates and describes a custom IVR development environment designed and developed by Tim Noonan and Aurora Information Technology Pty Ltd on behalf of Royal Blind Society, Australia. At the time of writing this product is still in its late stages of development, a demonstration of its functionality and more details will be provided at the CSUN session.


The intent of this section is to present human factors principles applicable to IVR application design which can result in more intuitive, efficient and pleasant telephonic interactions for your IVR users.

The updated Australian and New Zealand Standard (AS/NZS 4263) on the user interface design of Interactive Voice Response (IVR) systems, is a solid basis for developing your applications. However, this standard is relatively brief and has been written to cater for a diverse range of IVR applications and is therefore general in its recommendations. Although predominantly developed in Australia, this standard is also highly applicable in most countries as it is based on established conventions, guidelines, human factors experience and findings from a wide range of sources and countries.

In order to develop an IVR system which is going to meet the wide variety of potential user’s needs, it needs to be well planned, thoroughly user-tested and conform to generally accepted user interface principles for telephone services. Designing an IVR system which is easy and intuitive to use relies much more on common sense than clever programming. If some of the suggestions in this paper seem like commonsense, that is probably because they are. Nevertheless, many systems in current use - though technically commendable - are very difficult to use.

Since IVR systems are developed to serve their callers, they need to be optimised to human ways of thinking and responding. The best IVR systems are sculptured to fit their users; in the long-run this extra effort is far easier, affordable and successful than trying to mould your callers to a cumbersome user interface. For this reason, developing early prototypes, enlisting the assistance of beta testers and focus groups is a vital step in producing quality and effective IVR services which meet all your user’s needs. No matter how "clever" the application is, if it isn’t consistent, easy to use, self-documenting, well structured and reliable, then users will shy away from its use.

If you have a large number of callers (many at different levels of ability) then having different prompting levels may also be desirable. Novices may not be offered all choices while advanced callers should be allowed to move about a relatively complex menu structure rapidly with very brief prompts.

Always allow users to make selections without having to listen to all messages. This allows experienced users to move through the system quickly and saves your resources.

IVR systems need to be relatively simple in design if there are a large number of diverse callers.

There are fundamental differences between designing visually-oriented and speech-oriented applications. If you leave the task of developing an IVR to programmers and designers skilled in the design of screen-based applications, but without telephone applications experience, then the results may be far from satisfactory and inconsistent with the design principles found in well designed IVR services in use today.

In visual applications, A computer screen is two-dimensional. The user is able to look at any part of the information displayed on the screen at will. Highlighting, fonts and location convey structure and relative importance to different elements of material on the screen.

In contrast, an auditory interface is serial, rather than two-dimensional. Only one word can be heard at a time, and the order in which material is delivered is therefore very significant.

The following guidelines illustrate some of these differences:

Your system should be self-documenting through well scripted prompts and on-line help. Callers might not have a manual with them for the system, or they may be doing something (such as driving) which means they cannot refer to printed documentation. Because preparing help material can be tedious, this also encourages designers to make the system more intuitive and less complex. By keeping to the guidelines just listed, - your help scripts will be more straight-forward.

Designing a system which is not reliant on printed information also means that your system is fully useable by people who are unable to read print including people who are blind, or those with other print disabilities. In addition, many people from a non English-speaking background may be able to speak and understand English, but may be unable to read English comfortably.

Menu structure and scripting are arguably the most important parts of developing a successful IVR system. A few things to keep in mind when scripting include:

IVR developments are having a dramatic impact on business, sales and communications. By incorporating these principles and by planning and testing your system thoroughly, you can establish a quality service (and if applicable) gain a strong competitive advantage.

A more detailed treatment of user interface design issues is available in another paper by the author titled "Designing User-Friendly Voice Systems" available online at http://www.softspeak.com.au/ivrpap98.htm.


The huge majority of existing IVR applications are based on auditory presentation of a limited quantity of in-house information. Examples include telephone banking, phone bill-pay, movie-line, automated attendant systems and the like. But these systems can’t deal with large quantities of information, or have difficulty presenting information that is continually changing. Systems which read out share prices and information which is frequently updated, often do so in a stilted manner, making it difficult for the caller to understand and interpret the data.

Text-to-speech really comes into its own when an increased range of information needs to be accessed over the telephone - such as newspaper articles, catalogue entries, jobs-wanted advertisements, classified ads, web pages and e-mail messages – whereas a digitised speech application can’t deal with undefined information that hasn’t been read into the system ahead of time.

Another benefit of text-to-speech-based telephone services is that they can be adjusted by the user to meet their preferences. The personality of the voice, the speed and volume can all be adjusted and saved (if this facility is offered by the application designer. In the applications developed by the author, different voice characteristics are used to represent different modes of the application. Thus the help voice, the menu voice and the article reading voices in our newspaper application can all be adjusted and saved separately from one-another. This provides further context clues to the caller about where they are in the system, and what might be expected of them in different contexts.

A variety of challenges emerge when moving from human-recordings across to TTS-based IVR applications. However, as already covered, these challenges are accompanied by many side-effects and benefits which make such systems truly dynamic and customisable to meet user preferences and needs.

Many users don’t like text-to-speech when they first encounter it. Others find it difficult to understand some speech engines, while finding others quite clear and easier to attend to. Its important to test with potential users the options for synthetic speech before rolling out your application, to identify a voice manufacturer that your users find relatively acceptable. Its also important to work with users to establish default voice settings which ensure clarity of information for the largest possible range of callers.

, although text-to-speech reliably operates in the PC-based world - smoothly operating on windows-compliant sound cards - the process of integrating specialized telephony hardware, such as dialogic telephony boards and software-based speech synthesis is a lot harder than one would expect.

Our experience has found that The DECTalk software speech system lends itself particularly well to telephony applications, because it doesn’t have the short-comings of SAPI for control of the speech and reliable use of indexing capabilities of the synthesizer. More work is needed to deal with pronunciation errors from the DECTalk, but this can be traded off by the richness of the DECTalk API and the extensive options available for tailoring the speech for specific content.

In order to present textual information optimally, it is necessary to analyse the text for potential problem content and perform substitutions to ensure it will be read aloud with clarity and accuracy. We have found the PERL (Pattern Extraction and Report generation Language) to be ideal for this task. PERL makes use of powerful wildcards (termed regular expressions) to identify patterns in the input text, and – based on the context those patterns occur in – can selectively make substitutions to the text for improved output to the speech synthesizer.

Examples of where this approach has been invaluable include


At the time of writing, we are most-of-the-way through the development process for this product. As we embark on development of our telephone-based library catalogue, we will continue to make occasional enhancements to the IVR Script platform. At the presentation more detailed information and examples of the development environment and our phone-based applications will be given.

We have invested a great deal of time into identifying the best and most flexible platform for integrating digitised speech and text-to-speech audio into the one telephone application. Running under Windows NT, and using Dialogic hardware and third-party tools, we have been able to develop a high-level powerful scripting language – which we call IVR Script – which can be used to rapidly and easily develop powerful and intuitive applications.

This system has been developed to enable rapid information service development, enables novices to customise IVR services, and seamlessly supports systems incorporating any mix of synthetic speech or human-recordings, using a high-level scripting language. Other off-the-shelf IVR development products are customised only for recorded speech-based applications, and are not at all oriented to text-to-speech telephone applications. Our product aims to overcome this limitation, putting both output modes on an equal footing, leading to more dynamic and information-rich automated information services. Our high-level scripting language and development environment offers many unique advantages including

Future versions of IVR Script may include a TTY output mode, allowing a text-to-speech application to transparently serve both standard telephone users and those using BAUDOT-capable TTY devices.


Although there are some added issues involved in developing a robust, responsive and effective text-to-speech-based interactive voice response system, the benefits of access to dynamic and undefined information far out-weigh those disadvantages. The Royal Blind Society IVR Script development environment strives to take the complexity, the uncertainty and the lengthy application development time out of new IVR development. Its aim is to allow the developer to use a commonsense approach to system design and customisation, but without the need to get distracted by the inherent complexities and low-level programming considerations commonly associated with IVR development.

Please contact Tim Noonan (the author) at tnoonan@softspeak.com.au if you would like additional information regarding the IVR Script development environment, or if you wish to inquire about obtaining advice and input on designing user-oriented interactive voice response systems.

Go to previous article 
Go to next article 
Return to 2000 Table of Contents 
Return to Table of Proceedings

Reprinted with author(s) permission. Author(s) retain copyright.