2003 Conference Proceedings

Go to previous article 
Go to next article 
Return to 2003 Table of Contents 


HIGH SPEED CONVERSION OF PRINT MATERIAL TO ELECTRONIC TEXT AND AUDIO FILES

Presenter
Margaret Londergan
Indiana University
2711 East 10th Street
Bloomington, IN 47408
Phone: 812-856-4112
Email: londerga@indiana.edu

For two years students who are blind or have low vision, dyslexia or traumatic brain injury have been able to get needed print materials rapidly and accurately converted to electronic text. The electronic text created by the conversion process can be used with a wide variety of applications like Kurzweil 1000, Kurzweil 3000, and Zoomtext as well as others. Additionally, the converted texts can be turned into a variety of audio files for students to access as MP3 files or other audio format files like .WAV files. The goal of this paper is two fold. First, the high speed scanning process as developed and refined at Indiana University will be described. Second, student response to having the files in electronic format to use with a variety of applications like Kurzweil 1000, 3000 and other applications will be discussed.

When I first developed the Adaptive Technology Center for Indiana University two and a half years ago, students were able to scan documents for text conversion using flat bed scanners. They were then able to use programs like the Kurzweil reading assistance program for students with dyslexia or had low vision. Students who were blind used Kurzweil 1000 to access the electronic files their materials. Initially the students were very excited by the advantages of using programs to provide access to materials. However, the tedium of the flatbed scanning process soon caused many to abandon the conversion process. A poor technological solution (slow, flatbed scanning) prevented the use of really helpful, sophisticated technology. The frustration I witnessed as students abandoned the poor technology that would lead them to the useful technology caused me to focus on a better solution at the text conversion level.

After researching the availability of high-speed scanners, I selected the Canon DR5080C duplexing scanner, which has a throughput of upwards of 70 pages per minute in duplex mode. I combined this scanner with a suite of three computers on a local area network for file sharing to manage the four main parts of the text conversion process….scanning, optical character recognition, file storage and media production. Furthermore, I had a program written to automate part of the scanning process and to manage the files that were created.

Although a single computer could manage these processes, the heavy demands for production of electronic text at certain times of the year make distributing the processes over several computers more efficient. The scanning and OCR processing computers have 1.4 GHz processors, a gigabyte of memory and multiple 160 MB hard drives. The large amounts of memory insure fast processing especially with regard to the OCR process. Files are scanned using the Canon DR5080C scanner using the utility that comes with the scanner. To prepare books and materials for the scanning process, the covers are removed from the books and the backs of the books are removed using a guillotine chopping device, which the library has given to the Adaptive Technology Center. (This device came from the preservation department of the library when they acquired a newer, more sophisticated device.) Scanning is done chapter by chapter as this produces the most manageable format for students. Materials are fed into the scanner as the scanning specialists prepare them and this can be a continuous process, which works well as the specialists are familiar with nuances of font types, paper types, and other document features that would affect scanning. This completes step one of the process.

Step two of the process is optical character recognition (OCR) which is how the scanned material is translated into text that can be read by a text-to-speech program. This process is managed transparently by the second computer, which does this processing on the files, which are moved from the scanning computer to the OCR computer. Using a program written in-house called OCR Rocket, the files that are scanned are automatically opened, recognized, and saved as chapters in the directory that is created for each book. This process is initiated as the scanning specialist begins the scanning process and the OCR Rocket runs until it finds there are no more files to process. OCR Rocket can be set to run overnight and process all files that have been created but have not been OCRed prior to the time that staff leave at the end of the day. The program also reports any files that have encountered problems during the OCR process and in this way the scanning specialists are notified of scans that need to be redone due to missing pages, poor scan quality, etc.

Step three involves storage of the scanned text documents. At present over 1,300 books and course packs have been converted to electronic text. These files are currently archived over several workstations and a server where file backup is provided to prevent data loss. (The scanned files represent approximately half a year's work at forty hours per week dedicated only to print to electronic text conversion.) Better mass storage options are currently being explored and will be able to be discussed by the time of the CSUN conference.

Step four of the process involves providing the electronic documents to the users. At present all documents are provided in the format that suits the application the student will use to access them. This included the following formats: .txt, .kes for use with Kurzweil applications, .rtf and others. Electronic files are provided to students of CDs which are free to students using the service.

Additionally, students are instructed in the means of making audio files for use with MP3 players or other audio devices. To do this Cool Edit and other applications are used to convert electronic text to audio files. At present we have the students do this text to audio conversion for their portable devices as they can convert only the portions of text that they feel it is important to have in this format. Using this technique, students who use, for example, Kurzweil 3000 can extract their notes or highlighted portions of text and convert what they feel is most important to them to audio format for review and enhanced learning.

Texts converted from print to electronic text are between 90 and 97% accurate. This approximately the same accuracy as scans created using flatbed scanners.

Students' reports on the benefits of high speed scanning and the applications like Kurzweil 3000 are enthusiastic. One student charted the change in his grades after high speed scanning provided him his texts in a timely manner. He proudly reported making the Dean's list. Another student reported that because of access to reading assistance programs and electronic text that graduate school had become a possibility. Another student said, "Prior to having the reading assistance program (Kurzweil 3000) and high speed creation of my texts, my life was drudgery. I was barely able to keep up. Now, I not only can keep up and do well, I have time for my other interests as well."

Issues of copyright are managed by insuring that each student has a copy of the book that is provided to them electronically. Books that are chopped apart can be rebound with comb bindings or three hole punched by any of a variety of copy stores like Kinkos. The charge for rebinding is usually no more than three dollars per book regardless of size.

The scanning process is managed by the team of scanning specialists whose work is devoted to conversion of print to electronic text. Students are responsible for bringing their books for processing and picking them up along with the electronic copies of their books which have been burned to CDs.


Go to previous article 
Go to next article 
Return to 2003 Table of Contents 
Return to Table of Proceedings


Reprinted with author(s) permission. Author(s) retain copyright.