Freitag, 3. März 2017

Noise filter for QR-Code detection in scanned documents

Im am using zxing in a project to detect qr-codes within scanned documents. The goal is to achive almost 100% recognition but there were some issues to solve:

  1. zxing does not find small codes within a document page. Since the qr-code stickers are pinned on the documents manually the user has to pin the sticker in one of the 4 corners.
    The processor than cuts out corner by corner and searches for the code there.
  2. Unfortunately there were still non-recognized codes. Detection relies on printing quality of the stickers which not may be accurate in every situation.
    I did some tests and corrected non-recognized codes with gimp until they worked. I came to the conclusion that a filter is needed to eleminate false pixels as well as possible.

I spend an evening on that and finally found a specialized solution. Take a look and the sample images:

Left: original, Right: filter applied

The result is amazing, isn't it? zxing is now able to recognize the code.

How does it work?

My first idea was to use OpenCV to implement the filter but I than tried a very simple "self-made" algorithm:

  1. Input has to be already black/white pixel-data
  2. Iterate pixel (by rows and columns)
  3. Leave white pixels as they are
  4. For each black pixel calculate black pixels in the surrounding 7x7 square.
  5. Calculate the ration black pixels in 7x7-square / 49
  6. If the ration is less than 0.4 -> set pixel to white

It is important to work on a copy of the input data. The filter must not analyse pixels which have been modified by the algotithm.

Since qr-codes consist of rectangular patterns the filter does almost not destroy real data as long as the stickers are pinned likely straight.
Typical noises from bad printers or scanning failures are reduced very well.

When it goes close to the borders there is no 7x7 square available. It would be possible to leave that areas. I decided to shrink the square according to the position and process data in the same way.

Of course the 7x7 is adjusted to the qr-code size and the scanning resolution.

The following illustration shows the 7x7 square around the current pixel. The result of the black pixel count in that case equals 5 (current pixel no included). The current pixel will be erased and set to white.

Illustration 7x7 square

Make it simplier

A friend of mine pointed out that calculation of the ratio is not necessary. The pixel size of the square is always 7x7 = 49. So the threshold of 0.4 can be pre calculated as 0.4*49 = 20.

Exception: The border areas of the image. The square is shrinked but it is no problem to use the precalculated threshold. The algorithm is than a little more "aggressive" at the image borders (first 3 pixels).

Close areas

Next step is to use the algorithm to close areas. If the threshold as greater than 40 pixels are set to black.

The following image shows the progress. Please enlarge the picture and compare the middle and right sample.  you will see that some white pixels in the data blocks have been closed.

Left: orginal, middle: cleared, right: closed