US 12,423,507 B2
	Elucidated natural language artifact recombination with contextual awareness
Aaron K. Baughman, Cary, NC (US); Nicholas Michael Wilkin, Issaquah, WA (US); Gray Franklin Cannon, Atlanta, GA (US); and Christian Eggenberger, Wil (CH)
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Aug. 15, 2022, as Appl. No. 17/887,647.
Application 17/887,647 is a continuation of application No. 17/372,702, filed on Jul. 12, 2021, granted, now 11,475,211.
Prior Publication US 2024/0054282 A1, Feb. 15, 2024
Int. Cl. G06F 16/242 (2019.01); G06F 40/166 (2020.01); G06F 40/197 (2020.01); G06F 40/20 (2020.01); G06F 40/253 (2020.01); G06F 40/30 (2020.01); G06F 40/56 (2020.01); G06N 3/08 (2023.01)

CPC G06F 40/166 (2020.01) [G06F 16/242 (2019.01); G06F 40/197 (2020.01); G06F 40/20 (2020.01); G06F 40/253 (2020.01); G06F 40/30 (2020.01); G06F 40/56 (2020.01); G06N 3/08 (2013.01)]

21 Claims

1. A computer-implemented method comprising:

extracting, using a first NLP pipeline, a first digital content dataset from a first information source, wherein the first digital content dataset comprises a factoid;

extracting, using a second NLP pipeline, a second digital content dataset from a second information source, wherein the second digital content dataset comprises an insight;

loading a plurality of digital content datasets into memory, wherein the plurality of digital content comprises at least the first digital content dataset and the second digital content dataset, wherein each digital content dataset in the plurality of digital content datasets relates to a content topic, and wherein each digital content dataset in the plurality of digital content datasets is optimized based on a joint probability;

constructing, for a textual content of a candidate textual item from among the plurality of digital content datasets, a feature vector for the candidate textual item;

computing, using the feature vector, a relevance score for the candidate textual item, the relevance score being indicative of a relevance of the candidate textual item to a subtopic of the content topic;

executing a set of instructions in a processor using the relevance score to, perform a multiple knapsack problem (MSP) algorithm to include the candidate textual item in a group selected from a plurality of groups, each group in the plurality of groups being configured to comprise at least one candidate textual item;

training a first pre-trained encoder-decoder model using the group as a first designated group of candidate textual items, wherein the first pre-trained encoder-decoder model is pretrained to generate textual content according to a first style of writing; and

generating, utilizing the first pre-trained encoder-decoder model, machine-authored textual content in the first style of writing resulting in a first article about the subtopic based on the first designated group of candidate textual items, wherein the machine authored textual content in the first style is distinct from a machine authored textual content in a second style of writing in a second article, and wherein the first article and the second article pertain to the subtopic.