Xu Liu
liuxu@cs.umd.edu
Department
of
Computer Science
University of Maryland, College Park
| Fig.1 Mobile Currency Reader |
Dedicated devices such as "Kurzweil reader"[3] have been introduced to help reading currency, but they are often bulky and expensive. Novel systems such as iCare[4] have also been developed to help the visually impaired people with pattern recognition. iCare uses a wearable camera for imaging and a PC for computation. We propose an alternative solution to employ the ubiquitous camera phone[10] to identify different denomination in an instant and inexpensive way. The combined imaging and computational power of new devices has inspired us to embed image processing and computer vision algorithms into the devices. Although for this project we target for reading currency, the designed framework can be extended to help the visually impaired users identify other objects as well.
Classic pattern recognition algorithms usually include feature extraction and feature classification. Widely used features such as SIFT[5] or SIFT-likes[6,7] have high repeatability. SVM[8] and neural networks[9] can be trained to achieve high accuracy given enough time and space allowance. However, these classic pattern recognition approaches cannot be ported directly to mobile devices. Implementing pattern recognition on mobile devices has three major challenges.1)The limited processing power of the device, 2)the fact that the captured scene could contain complex background resulting in false positive that must be eliminated, and 3) the expectation of the user who typically expects instant feedback and requires on line (real time) recognition.
These three challenges are related to the speed of the algorithm. The algorithm must be efficient enough to fit in the light-weight device and be able to discard quickly images or pixels that are not of interest, so more time can be allocated to the image that contains objects to be recognized. Ideally, when an algorithm is efficient enough to run in real time, the recognition can be performed on the video stream of the camera, and the user does not have to hit a key to capture an image. We choose to process the image in real time which provides a much smoother user experience and avoids motion blur caused by "click-to-capture," but as noted earlier, it must typically deal with lower quality data.
![]() |
| Fig. 2 An example of random pixel pair |
![]() |
| Fig. 3 Positive and negative samples |
|
|
| Fig. 4 Maximize margin between positive and negative samples |
In
order to detect and recognize a bill, we first binarize the image
and remove irrelevant background. Black pixels touching the boundary of
the image are regarded as
backgrounds since the bill always has a white boundary along the edge.
After removing the background some noise might still exists. We further
refine the
location of a bill by running a breadth-first-search (BFS) from the
image's center to remove the remaining noise. The complexity of this
step is linear in the number of pixels in the image and after
processing we know the exact position of the feature area. We then
normalize extracted area to a
rectangle with an aspect ratio of 4:1 for
recognition.
We collected 1000 samples of
captured currencies of each side of the
most common U.S. bills. Each has four sides, two front and two back. We
also collected 10000 samples of general scenes which are not currency.
For each side of a given bill, we use Ada-boost[13] to train
a strong classifier from a set of weak classifiers. The weak
classifiers must be computationally efficient
because hundreds of them must be computed in less than 0.1 second.
We define a weak classifier
using 32 random pairs of pixels in the
image.A weak classifier will provide a positive result if more than 20 pairs
are satisfied and negative otherwise. 10 weak classifiers selected
using Ada-boost form a strong classifier that identifies a bill as
long as it appears in the image. To recognize a bill we only need
32x10=320 pair-wise comparisons of pixels. Our system is trained to
read $1,$5,$10,$20,$50 and $100 bills and can process 10 frames/second
on a Windows Mobile (iMate Jamin) phone at a false positive rate
<10-4. It should be pointed out that this framework is general so that
new notes (e.g. $2) can be easily added to the system.
![]() |
| Fig. 5 Integral Image |
Although
the initial design of the currency reader satisfies our primary
requirements of real time recognition and has a high accuracy, it could
be further improved after an experimental study of its practical use.
Users with visual disabilities identified two major disadvantages of
the initial design. First, it required the coverage of the entire right
hand side of the bill, i.e. the upper right and bottom right side of
the bill must be captured at the same time. However, it may be
difficult to accomplish such coverage without a great deal of practice.
Second, users with visual disabilities like to fold the bills in
different ways to distinguish among denominations, but folding can
change the shape of the right hand side of a bill and may disturb the
recognition.
This suggests the use of a
smaller feature area for recognition because
it is easier to capture and less likely to be disturbed by folding. We
have refined our currency reader to identify a feature area with the
number denomination as shown in Figure6. Feature areas are
first detected using a fast pre-classifier and then identified using a
strong classifier based on random local pixel pairs, as described in
Section 3.1.
To detect an object in an image, an exhaustive search is usually inefficient because most of the areas in the image do not contain the object in which we are interested. A pre-classification which filters these areas is therefore important and can speed the detection by ten times or more. In our research we found that a box filter computed using an integral image is very efficient and can be applied to mobile devices. In an integral image, at each pixel the value is the sum of all pixels above and to the left of the current position. The sum of the pixels within any rectangle can be computed in four table lookup operations on the integral image in constant time as shown in Figure 5. If we replace the original image with an image with the squared gray scale value at each pixel, we can then compute the standard deviation (second order moment) within any rectangle in O(1) time. Any order of moment can be computed in O(1) time using an integral image. Both the Viola-Jones face detector[9] and SURF[7] benefit from the speedup of the box filter and integral image.
|
|
|
Fig. 6 Standard deviation of 6 sub-images at 3 corners of a 20 dollar bill |
We found that the standard deviation (STD) in the sub-image of an object is relatively stable and combination of STDs can be used as a cue to filter non-interest image areas. In Figure 6, we divide the feature area of a twenty dollar bill into 6 boxes (3 vertical and 3 horizontal). The STD in each sub-window falls in a relatively stable range and we search only within these ranges for the potential corner patterns to recognize. In each sub-window an STD range may span at most 50% (red) or even less (blue) of possible STD. Assuming the STD in each sub-window is independent and is equally distributed in an arbitrary scene, the box filter can eliminate 1-(1/2)6=98.4% of the computation by discarding low probability regions. In our experiment we found the pre-classification can speed the algorithm by 20 times on a camera phone.
To
meet the requirements of users with visual disabilities, we pay special
attention to the details of the user interface. Every operation of the
software is guided by a voice message. It requires two key presses to
activate the camera to prevent accidental activation. The software
automatically exits after being idle for two minutes to save battery
power. The user has the option of "force" recognition of a bill by
pressing the center button. The software will search for additional
scales and positions for the feature area in "forced" recognition.
![]() |
| Fig. 7 User evaluation of mobile currency read |