US 12,147,776 B2
Method for extracting information from an unstructured data source
Raghavendran Pownraju Mahendravarman, Potsdam (DE); Berinike C. K. Tech, Berlin (DE); Arsen Hnatiuk, Berlin (DE); and Robin P. G. Tech, Berlin (DE)
Assigned to AtomLeap GmbH, Berlin (DE)
Filed by AtomLeap GmbH, Berlin (DE)
Filed on Apr. 11, 2022, as Appl. No. 17/717,514.
Prior Publication US 2023/0325606 A1, Oct. 12, 2023
Int. Cl. G06F 40/40 (2020.01); G06F 16/34 (2019.01); G06F 16/951 (2019.01); G06F 40/295 (2020.01)
CPC G06F 40/40 (2020.01) [G06F 16/345 (2019.01); G06F 16/951 (2019.01); G06F 40/295 (2020.01)] 16 Claims
OG exemplary drawing
 
1. A method for extracting information from an unstructured data source, the method comprising:
scraping, by at least one processor, a plurality of texts from the unstructured data source, the scraping comprising obtaining an unstructured formatted text from the unstructured data source, obtaining a title of the unstructured formatted text, based on a formatting of the title in the unstructured formatted text, using a title classification model to classify the title as one of relevant and non-relevant, and if the title is classified as relevant, parsing the unstructured formatted text and including the title in the plurality of texts;
extracting, by the at least one processor, from the plurality of texts a chunk of relevant text;
summarizing, by the at least one processor, using a pre-trained summarizer, the chunk of relevant text, each of the at least one processor to summarize the chunk of relevant text in parallel to obtain semi-structured information comprising a set of sentences that summarize the chunk of relevant text; and
postprocessing, by the at least one processor, the semi-structured information to obtain structured information.