US 12,450,438 B1
Relation extraction from text-based documents
Sonal R. Pardeshi, Redmond, CA (US); Vittorio Castelli, Croton-on-Hudson, NY (US); Bonan Min, Palo Alto, CA (US); Kishaloy Halder, Issaquah, WA (US); Yogarshi Paritosh Vyas, Brooklyn, NY (US); Venkatesh Nagapudi, San Jose, CA (US); and Kapil Singh Badesara, Seattle, WA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Sep. 29, 2023, as Appl. No. 18/478,616.
Int. Cl. G06F 40/295 (2020.01); G06F 40/109 (2020.01)
CPC G06F 40/295 (2020.01) [G06F 40/109 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a training subsystem comprising one or more computing devices, wherein the training subsystem is configured to:
receive, from a user computing device, a corpus of annotated text documents, wherein individual text documents of the corpus of annotated text documents are annotated to indicate text spans corresponding to entities, entity types associated with the entities, and relations between the entities; and
train an entity recognition model and a relation extraction model using the corpus of annotated text documents; and
a relation extraction subsystem comprising one or more computing devices, wherein the relation extraction subsystem is configured to:
generate set of entity data items using the entity recognition model and text of a semi-structured document, wherein a first entity data item of the set of entity data items represents a first entity mention in the text, and wherein a second entity data item of the set of entity data items represents a second entity mention in the text;
generate augmentation data regarding one or more layout properties of the text;
generate a set of relation data items using the set of entity data items, the augmentation data, and the relation extraction model, where a first relation data item of the set of relation data items represents a relation between the first entity mention and the second entity mention; and
generate, using the set of relation data items, a user interface configured to present relations between entities in the text.
 
5. A computer-implemented method comprising:
under control of a computing system comprising one or more computing devices configured to execute specific instructions:
generating a set of entity data items using an entity recognition model and text of a document, wherein a first entity data item of the set of entity data items represents a first entity mention in the text, and wherein a second entity data item of the set of entity data items represents a second entity mention in the text;
generating augmentation data regarding one or more layout properties of the text;
generating a set of relation data items using the set of entity data items, the augmentation data, and a relation extraction model, where a first relation data item of the set of relation data items represents a relation between the first entity mention and the second entity mention; and
generating, using the set of relation data items, a user interface configured to present relations between entities in the text.