Jump to Content
T2RERC  

home > development > demand pull > needed technology

DP2: Demand Pull Program

Needed Technology

Hand-Held Video Camera for Optical Character Recognition: Problem Statement

 

Abstract | I. Business Opportunity | II. Current Technology | III. Technology Requirements | IV. References

Abstract

Technology has shown great promise in providing access to textual information for people with low vision and blindness. Optical character recognition (OCR) allows people with visual impairments and blindness to read volumes of typewritten documents with the help of flat bed scanners and OCR software. Digital video magnifiers enable people with visual impairments to receive a magnified image of text and environmental elements via camera technology. A marriage of these devices would permit access to printed text in a way that has been impossible to achieve in the past.

[ Top of Page ]

Business Opportunity

Printed textual information permeates every facet of daily life, whether or not it can be seen or read. Signage and labeling are routinely used to convey information in a multitude of environments. The inability to make use of this printed textual information puts people at a distinct disadvantage when attempting to function in society. An estimated 7.7 million people in the United States alone have difficulty reading ordinary text in a newspaper, even with corrective lenses. Additionally, 1.8 million people cannot read the text in ordinary newspaper at all (McNeil, 2001). An affordable and truly usable technology, with the potential to eliminate these barriers to access, has not yet come to fruition in the marketplace.

Reading a newspaper headline, a product label, or a price tag while shopping can be difficult unless the visually impaired consumer can use a magnification device and the object can be brought close to the eye. There are a range of assistive technology products that people with visual impairments can use to facilitate the reading process including optical magnifiers, digital video magnifiers, monoculars, and telescopes. Using these technologies can be incredibly time consuming and stigmatizing for people with low vision. For example, grocery shopping involves scanning the shelf to determine the name of the desired product and a glance at the price of that item. People with visual impairments would have to locate and view each item separately in order to determine the item name and price. This process would greatly extend the time necessary to shop for even a few items. Reading a black board or multimedia presentation in a work or school environment requires more advanced and expensive technologies, such as the Joint Optical Reflective Display (JORDY) or the Low Vision Enhancement System (LVES). These technologies raise cosmetic issues as they are both head-born technologies housed within a visor.

People who are blind employ a variety of technologies and strategies to access environmental text including Braille labeling, tactile labeling, and flatbed scanners with OCR capabilities coupled with text-to-speech software. Accommodations include personal organizing schemes (e.g., clothes in closet, cans in cupboard) and asking for help from sighted individuals. Technologies and accommodations constitute a patchwork solution that is short of ideal, leaving the 7.7 million people at a distinct disadvantage when attempting to function independently in their homes and communities (McNeil, 2001).

In order to provide equal access to educational, employment, and community environments, a device that can provide access to all environmental text is required. A hand-held device that can magnify, recognize and transform text into speech or tactile output would address the critical needs of people with visual impairments. Independent functioning in all environments is critical to real participation in community life. The proposed device would allow people who are visually impaired to access text on consumer products in their homes and in the community for wayfinding activities. It would also level the playing field in educational and employment environments.

[ Top of Page ]

Current Technology

The LVES was manufactured and sold by Visionics Inc., based in Minneapolis, MN, beginning in 1994. It cost over $5000, with an additional cost of $1200 for fitting and training. The LVES is beneficial for people with remaining vision of approximately 20/100 to 20/800. It features a 50 degree field of view, variable magnification (1.5 to 12 times), contrast enhancement and reversal, and direct video input for television. Major problems that are cited by users include weight, cost, limited usability across environments, inability to provide color, and use of low cost cathode ray tubes (CRT) rather than the smaller and lighter flat panel displays (Dagnelie, 1997). The LVES had limited sales and was pulled from the market in 1997 (National Aeronautics and Space Administration Science and Technical Information (NASA STI), 2003).

Market potential for an LVES-like device is suggested by a telephone poll conducted by John Hopkins University showing that 200 of the 400 people who purchased LVES are still using it today (Dagnelie, 1997). The LVES is reported to be most useful for people who require a system that can readily adapt to different working distances and who need less than 8 times magnification (Weckerle, Trauzettel-Klosinski, Kamin, and Zrenner, 2000). As always, cosmetics created issues for many users.

The Joint Optical Reflective Display (JORDY TM v.2) is a head-worn device that allows people with low vision to view objects and text in their environment at varying distances with up to 50 times magnification (Enhanced Vision, 2002). JORDY™ v.2 allows users to complete stationary vision tasks: hobbies, crafts, reading, and writing, watching television and recognizing faces at a distance. The JORDY™ v.2 does not allow users to view objects while walking or driving (ABLEDATA, n.d.). It weighs approximately eight ounces and features auto focus, focus lock, digital zoom, multiple viewing modes (i.e., full color, black and white, high contrast positive, and negative), image stabilization, built in lighting and an object locater. A docking station accessory transforms JORDY™ v.2 into a stationary video magnifier (Enhanced Visions, 2002). The JORDY™ v.2 retails for approximately $2,700 ($300 for the docking station). NASA reports on its Scientific and Technical Information (STI) webpage, that a new system designated as JORDY™ v.3 will weigh less than 2 ounces (NASA STI, 2003).

A number of technologies are used by people with low vision to perform single tasks. For close reading tasks, hand-held magnifiers are often used to magnify text. These devices can be placed over an object or text to enlarge or magnify it. These magnifiers are generally very easy to manipulate which enables users to adjust the working distance easily (Levack, 1994). The portability of these magnifiers makes them an attractive option for many people with visual impairments. A stand magnifier, which sits on a base or has a clamp with an adjustable or flexible arm is also common. Stand magnifiers are an option when both hands are needed to perform the task or when motor control is not optimal.

In order to identify the denominations of cash to pay for items in the community, m oney identifiers are often used. These are small devices that provide speech output identification of paper money. The money can be inserted and read in any orientation. The volume is adjustable and standard headphone jacks are available to provide privacy. Money identifiers are typically programmed to identify currency in one country.

The information on prescription labels must be followed explicitly to ensure that optimal benefits are achieved. In order to ensure that happens, talking medicine bottles have been introduced to the marketplace for people with vision impairments. This simple technology allows the instructions on a medicine bottle to be read, including information on why and when medicine should be taken and how many refills are left.

Scanners and optical character recognition (OCR) software allow people with visual impairments to scan and read mail, office memos, magazine articles and other documents on their computer or download them to a note taker for portable reading. Optical character recognition involves the reading of text from paper and translating the images into a form that the computer can manipulate (Computer Digital Expo, 2003). A device optically analyzes printed text, recognizes the letters or other characters, and stores this information as a computer text file. OCR is usually limited to recognizing the styles and sizes of type for which they are programmed ( Texas School for the Blind and Visually Impaired (TSBVI), 2002). Unfortunately, the majority of these devices are not highly portable as they consist of a desktop scanner and computer. There are OCR devices that are very light and highly portable, such as the Reading Pen by WIZCOM (http://www.wizcomtech.com), but they are designed for people with learning disabilities. As a result, they do not provide text location assistance or guides to ensure proper scanning of text. They also require exact placement and positioning to operate properly.

Close Captioned Televisions (CCTV) and digital video magnifiers are also commonly used to access printed text. Digital video magnifiers use a video camera to project a magnified image onto a video monitor, computer monitor or TV screen. They are used to enlarge written materials and small objects, enabling a person with low vision to read and write. Currently available devices are not portable.

While some of these devices offer audio output, the utility is limited to a single application. Tactile output is not an option on these technologies. The popularity of these devices certainly indicates a viable market for a multi-purpose textual access technology. There is currently no comprehensive

[ Top of Page ]

Technology Requirements

Hand-held Video Camera for Optical Character Recognition (OCR) Application (generally a hand-held OCR or camera for digital displays (such as those found on appliances) or a hand-held camera that has the ability to extract numeric information, whether printed or electronic, and provide audio and/or visual feedback. Forum participants discussed adding a digital video magnifier to this system to create a universal tool that could access text in the environment and also serve as a tool for wayfinding. For additional information – please see wayfinding data.)

Need Areas:
  • users should include people with low vision, blindness, deafness-blindness, cognitive impairments and multiple disabilities;
  • able to read arbitrary text in multiple environments (e.g., signs, books, cans, bottles, dry cleaning receipts, etc.);
  • able to read street signs or signage (separate text information from arbitrary background information);
  • able to localize and identify text in arbitrary environments (very advanced capability);
  • probably based on a digital camera with OCR capabilities;
  • probably requires a dense matrix (high resolution) digital camera;
  • probably requires advanced algorithms (signal processing, artificial intelligence, neural networks, etc);
  • probably requires very high computational power (e.g., parallel processing);
  • low text recognition error rates;
  • able to filter out irrelevant information (background, images, graphics, etc);
  • able to read arbitrary text colors;
  • able to read text on arbitrary background colors;
  • able to read all font sizes;
  • ability to extract text from graphic (including labels);
  • able to recognize text at arbitrary distance and orientations;
  • able to recognize text on arbitrary surfaces (texture, background colors);
  • able to generate useful information quickly (requires efficient algorithms, good processing power);
  • control options for persons with multiple disabilities should include: eye-gaze, head-tracking, speech recognition, combined speech and gesture/eye motion, switch access, and sip ΄n puff;
  • controls for people who are deaf-blind should be integrated into a refreshable tactile (and Braille) display;
  • output options should include enhanced text, tactile, and audio;
  • output options for people who are deaf-blind should include a 2-D tactile and refreshable Braille display large enough to accommodate their specific needs;
  • able to re-render text into formats accessible to persons with low vision (font size, style, contrast, color, etc);
  • able to customize text enhancements to meet specific user needs (e.g., color substitution, contrast enhancement, edge detection, etc.);
  • must be portable (hand held, light weight, wearable);
  • must be compact (ideal size of the device for people who are partially sighted would be comparable to a pair of eyeglasses with a hip or back-mounted system for processing);
  • should have portable, un-tethered power supply (except perhaps when charging);
  • must have long battery life (preferably rechargeable);
  • should offer a sleep mode (power conservation);
  • must have good reliability;
  • must have good durability;
  • offer network connectivity for outside assistance (wireless access to remote databases or real-time help should be available via wireless connection with a 3G minimum for speed).
Tactile Braille and Graphic Computer specific requirements:
  • stand alone laptop computer for blind and visually impaired users;
  • computer capabilities (internet access, software storage, documenting, printing);
  • refreshable high resolution tactile interface for Braille and graphics;
  • tactile interface that has input and output capabilities;
  • user interface must accommodate different input modalities (Braille, speech, tactile);
  • user interface should be command line format, either typed or spoken;
  • provide auditory output (tones, speech);
  • as a user passes his hand over the tactile display, he should receive both auditory and tactile feedback in real time (immediate);
  • large package of software applications;
  • perform different tasks depending on the application that is loaded;
  • should be a tool for creating tactile images;
  • built in capabilities for user training;
  • ability to fold;
  • portable, but larger than pocketsize;
  • affordable.

[ Top of Page ]

References

  1. ABLEDATA (n.d.). JORDY TM. Retrieved January 12, 2004, from http://www.abledata.com/

  2. Computer Digital EXPO (2003). Optical character recognition. Retrieved March 18, 2003, from http://www.webopedia.com/TERM/o/optical_character_recognition.html

  3. Dagnelie, G. (1997). The Low Vision Enhancement System: Hype or help for low vision? Focus Newsletter, 2(3). Retrieved January 9, 2004, from http://www.focusnewsletter.org/lves.htm

  4. Enhanced Vision (2002). JORDY TM. Retrieved January 12, 2004, from http://www.enhancedvision.com/jordy.php

  5. Levack, N. (1994). Low vision: a resource guide with adaptations for students with visual impairments. Austin, TX: Texas School for the Blind and Visually Impaired.

  6. McNeil, J. M. (2001). Household economic studies: Current population reports: American with disabilities 1997. Retrieved January 23, 2004 from http://www.census.gov/prod/2001pubs/p70-73.pdf

  7. National Aeronautics and Space Administration Science and Technical Information (NASA STI) (2003). Improving vision. Retrieved January 12, 2004, from http://www.sti.nasa.gov/tto/spinoff2003/hm_7.html.

  8. Texas School for the Blind and Visually Impaired (TSBVI) (2002). Common acronyms used when speaking about accessible textbooks. Retrieved March 17, 2003, from http://www.tsbvi.edu/textbooks/afb/acronyms.htm

  9. Weckerle, P., Trauzettel-Klosinski, S., Kamin, G., & Zrenner, E. (2000). Task Performance with the Low Vision Enhancement System.Visual Impairment Research, 2(3), 155-162.

[ Top of Page ]