US 12,437,571 B2
	Document information extraction without additional annotations
Shachar Klaiman, Heidelberg (DE); and Marius Lehne, Berlin (DE)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on Oct. 22, 2020, as Appl. No. 17/077,568.
Prior Publication US 2022/0129671 A1, Apr. 28, 2022
Int. Cl. G06V 30/414 (2022.01); G06F 18/21 (2023.01); G06F 18/25 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2023.01); G06V 30/10 (2022.01)

CPC G06V 30/414 (2022.01) [G06F 18/217 (2023.01); G06F 18/251 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01); G06V 30/10 (2022.01)]

20 Claims

1. A computer implemented method for extracting a field from a document, comprising:

receiving, by at least one processor, the document and a key, wherein the key specifies the field to extract from the document, the key is in textual format, the field is a plurality of characters, and the document is an image representing a plurality of fields;

processing, by a convolutional neural network (CNN) of an encoder, the document, thereby obtaining a feature map;

combining, by the encoder, the feature map with positional information for each feature in the feature map, thereby obtaining a spatial-aware feature map;

processing, using a recurrent neural network (RNN) of a decoder, the spatial-aware feature map and the key, thereby extracting the field in the document corresponding to the key, wherein the processing the spatial-aware feature map and the key comprises:

initializing, a first RNN state associated with the RNN of the decoder, a first set of attention weights for an attention layer, and a first output vector;

generating a second set of attention weights for the attention layer based on the spatial-aware feature map, the key, the first RNN state associated with the RNN, the first set of attention weights for the attention layer, and the first output vector;

generating a context vector based on the spatial-aware feature map and the second set of attention weights using the attention layer;

processing the context vector, the key, and an input vector using the RNN to obtain a second RNN state associated with the RNN;

generating a second output vector based on the second RNN state and the context vector using a projection layer;

storing the second output vector in a list of output vectors;

repeating, until the second output vector corresponds to an end token, the generating the second set of attention weights, the generating the context vector, the processing the context vector, the generating the second output vector, and the storing the second output vector with the second set of attention weights set to a value of the first set of attention weights, the second RNN state set to a value of the first RNN state, the second output vector set to a value of the first output vector, and the second output vector set to a value of the input vector; and

extracting only the field from the document based on the list of output vectors, wherein each output vector of the list of output vectors is derived from the key.