Character recognition explained

One and a half year before someone asked in our WhatsApp programming group that how can we extract text from an image, one guy replied to him- "bro it will be possible in 2080"(rofl). 2-3 other guys also joined this conversation. That time I opened my browser and started googling to collect some good information for answering the question, lol it's smart to work when you don't know about something and quickly learn by googling just to explain. After getting some information from Google I replied to the second guy- "haha bro actually you are in 1980" and I also answered the question. This was a little conversation, but it made me curious about OCR. So, In this article I am going to share my knowledge about OCR in form of pieces given below:
1.    What is OCR?
2.    How it works?
3.    How neural networks recognize a character?

1. What is OCR?

The full form of OCR is Optical Character Recognition. OCR is a technique to convert(or extract) text which is in the image to computer understandable text, means you can copy, past and edit that extracted text. It is not necessary that our text should be in some certain fonts when it's an image, today we have advanced technology, even you can recognize and extract handwritten characters. OCR is supported for almost all image formats like png, jpg, gif etc. It makes our work easy, for example, if you work in a bank and someone comes with a check to transfer money in his account. You will read information on the check and then you will transfer money to his account, but what if the computer itself can do it for you?. It is also useful when you have hundreds of handwritten pages to write in MS word, your work will be done in very few minutes by using OCR.

2. How Optical Character Recognition(OCR) works?

Optical Character Recognition can be done using different-different techniques like neural networks(or AI) and template matching. Camera translation in Google translate is the best example of OCR using deep neural networks. Let me explain the process of recognition, I am going to explain it by an example. So, we have an image of mobile and there is a seven segment display in the mobile as you can see in image below:
We want to recognize the digits shown in the display. At this time they are just combination of pixels, but we can convert them to real digits. Steps are given below explain how it can be done.

Step 1: Our image is in RGB color space, to get better results we have to convert our image in the grayscale image. In our grayscale image, maybe there is some noise which can create a problem in the process ahead, in order to remove that noise we apply filter kernel(or blur). We are applying here 5x5 Gaussian kernel. If you don't what is kernel and how it is applied for blurring then click here to read our article on blurring.

Step 2: After getting a blurred image we have to apply binarisation to it. binarization is the process in which we separate our foreground(or interesting part) from background. There are some different-different possible ways for binarisation like separating by colors, separating by a threshold value, separating by edges. We separated our foreground and background by using Canny edge detection technique. Our edged image will look like this:

Step 3: Now it's time to find display region because all digits are in display. In order to find our display region, we have to find contour of our display. Basically, contours are the closed curves which can be obtained by edges and that is why we applied binarisation to our image. Our program gives us all possible contours(closed curves) in form of an array, but we are only interested in the contour of our display. We apply calculation on different things like perimeter, area, points and after that, we get our display contour. Display contour gives us the position, width, and height of a bounding rectangle in which display region lies. Maybe position points will be not perfect, but they will be enough for us. using position, width and height we can draw our bounding rectangle as you can see below.

Step 4: Now we will again apply binarisation on our grayscale image, but this time we will apply in our display region and also we will use a threshold value to separate foreground(or digits) and background. We are using here 0 as a threshold value, that means pixels which have 0 value(or black color) will be in foreground and pixels which have greater value than 0 will be in the background. Binarisation algorithm automatically assigns black color to background pixels and white color to the foreground. Our foreground will look like this:

Step 5: We have our foreground, but you can see there is one useless vertical line in left. We want to ignore that line and also we want to find bounding rectangles for each digit. For that, first of all, we will find bounding rectangles of all possible contours(closed curves) and the bounding rectangles which are in a specific range of width and height will be separated. Separated rectangles will be bounding rectangles of our digits and we can draw them as you can see below.

Step 6: Now we have information that where our digits are and how they look like, but still we don't have enough information to extract them. There are two ways to do it, either we can use neural networks or we can apply a certain algorithm which is made for seven segment display recognition. The neural network is the modern and best way to recognizing characters because we can it can recognize handwritten digits also.
3. How neural networks recognize a character?

Before understanding this you have understood what is training of neural networks. Training is a process in which we give thousands of examples to our neural network and our neural network(NN) learns from them, learning means nothing but try to get a minimum error. Thousands of examples which we give to our NN are called training dataset. You can see below some of the examples of training dataset for OCR.
After training we give image(or shape) of our character as input and the neural network recognize it, means it gives editable character as output.

All this stuff can be done in programming. There is a good library called OpenCV for it. OpenCV is available for Python, C++ and Java also.

Was this article helpful? do you have any doubt? we are curious to know, please tell us in comments below.

Post a Comment