US 12,001,802 B2
Training enrichment system for natural language processing
Tassilo Klein, Berlin (DE); and Moin Nabi, Berlin (DE)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on Jun. 3, 2021, as Appl. No. 17/337,835.
Prior Publication US 2022/0391592 A1, Dec. 8, 2022
Int. Cl. G06F 40/30 (2020.01); G06F 16/951 (2019.01); G06N 20/00 (2019.01)
CPC G06F 40/30 (2020.01) [G06F 16/951 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
8. A system, comprising:
a memory; and
at least one processor coupled to the memory and configured to perform instructions that cause the at least one processor to perform operations comprising:
identifying a natural language processor (NLP) trained on a first set of documents, wherein the NLP is trained to perform a set of functionality based on the first set of documents;
determining an industry in which the NLP is to be configured to perform the set of functionality;
identifying a set of words corresponding to the industry;
identifying a set of sentences including at least a subset of the set of words corresponding to the industry;
scoring the set of sentences based on a similarity to one another;
identifying a subset of the set of sentences that exceed a similarity threshold; and
training the NLP with the subset of the set of sentences that exceed the similarity threshold, wherein the trained NLP with the subset is configured to perform the set of functionality within the industry with a greater accuracy than an NLP trained on only the first set of documents.