Gone are the days when one had to visit banks or lending organizations physically to provide them with all the documentation required for an elusive loan. Forget 1 document and you have to make a trip all over again. Today, everything has moved digital. A simple upload of documents via a website or even through whatsapp both, from the comfort of your home or workplace and rest is left for the financial institution to take care off and validate the authenticity.
But it’s not just a process improvement for Users, even financial institutions have found out ways to quickly and more efficiently validate authenticity of the documents. This has been achieved through advances in Artificial Intelligence and Machine learning which helps in processing thousands of images received from customers without any human intervention. Aside from the significant improvement in processing times which this automation enabled, it also reduced the probability of human error.
While at Indifi we had been utilizing such a document verification system from an external partner for a long time, we thought being a technology first company that relies heavily on data for its credit decision making, we should also develop such tools inhouse and enhance our capabilities as a technology driven organization.
This is what led us to develop our own tool for Document verification purposes.
Starting with Tesseract
Our first document verification was based on Tesseract.
About Tesseract:
Tesseract — is an optical character recognition engine with open-source code, this is the most popular and qualitative OCR-library. OCR uses artificial intelligence for text search and its recognition of images. Tesseract is finding templates in pixels, letters, words and sentences. It uses a two-step approach that is called adaptive recognition. It requires one data stage for character recognition, then the second stage to fulfil any letters, it wasn’t insured in, by letters that can match the word or sentence context.
Why Tesseract didn’t work
Tesseract is a legacy technology which works very well on custom datasets which suit certain conditions. Most of these datasets do not resemble real world data. In our case the data can be very messy, a customer can upload an image of his documents in various ways- It can be titled at any angle, the background can be simple as white or as complex as bedsheets with handprints. All of these variations made it very difficult for us to get the accuracy that we were looking for. Even after immense preprocessing like background removal using deep learning, colour wise clustering, image skew correction, we were unable to get the results.
Often this is where most AI projects which have an unbounded user input struggle. Here is a wonderful article from a leading Silicon Valley Investing firm A16Z on how the economics of AI works differently than the regular software development and how most AI problems have a really long tail which increases the cost of developing solutions for them. Article Link
How does AWS make it so easy?
As we struggled with using Tesseract, we also burnt our hands at trying a custom model where we fed 200 self labelled (with bounding boxes) images. This was clearly not enough to train a robust model. This is when we came across a solution that was provided by AWS for the same.
Introduction to AWS Rekognition
Amazon Rekognition makes it easy to add image and video analysis to your applications using proven, highly scalable, deep learning technology that requires no machine learning expertise to use. With Amazon Rekognition, you can identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content. Amazon Rekognition also provides highly accurate facial analysis and facial search capabilities that you can use to detect, analyze, and compare faces for a wide variety of user verification, people counting, and public safety use cases.
Text Detection on AWS Rekognition
In photos and videos, text appears very differently than neat words on a printed page. Amazon Rekognition can read skewed and distorted text to capture information like store names, forced narratives overlaid on media, street signs, and text on product packaging.
Amazon Rekognition text detection can detect text in images and videos. It can then convert the detected text into machine-readable text. You can use the machine-readable text detection in images to implement solutions such as: Visual search, Content insights, Navigation etc.
As AWS Rekognition is trained on millions of images, it removed the need of custom training from our end. All we needed to build is a custom text parser and processing engine that can help us extract the relevant information based on the document we pass to it. Our data already being stored in AWS S3 services made this process further seamless.
Here is what we finally built out:
Our Tech Stack
- AWS Rekognition – For Detection and Recognition of Text in an Image [Pricing – $1 per 1000 images]
- Django and REST APIs – To save extracted text for new image (POST) and fetch extraction (GET)
- MongoDB – Database used to store outputs
Things we still need to Solve for
While AWS helped us build this loc cost document verification tool inhouse. It did have certain limitations – the biggest one being unable to see context in the text and not being able to detect Hindi Language. A lot of documents that we get from our small and medium business customers for loans often have text in Hindi and other Indian languages as well , which is what we need to solve for in the future.
It’s definitely an exciting time for image processing using Machine Learning and our tech team will continue to utilise such technologies to bring more operational efficiencies. By Manan Arora
Data Scientist @ Indifi Technologies