Extracting Data from Templatic Documents

Blog: Decision Management Community

In this article Google researches describe a novel approach using representation learning for tackling the problem of extracting structured information from templatic documents, such as receipts, bills, or insurance quotes. They “propose an extraction system that uses knowledge of the types of the target fields to generate extraction candidates, and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document.” Link