US 11,934,786 B2
Iterative training for text-image-layout data in natural language processing
Adam Dancewicz, Warsaw (PL); Filip Gralinkski, Warsaw (PL); and Lukasz Konrad Borchmann, Warsaw (PL)
Assigned to APPLICA SP. Z O.O., Warsaw (PL)
Filed by APPLICA SP. Z O.O., Warsaw (PL)
Filed on Mar. 28, 2023, as Appl. No. 18/127,458.
Application 18/127,458 is a continuation of application No. 17/807,313, filed on Jun. 16, 2022, granted, now 11,620,451.
Application 17/807,313 is a continuation of application No. 17/651,313, filed on Feb. 16, 2022, granted, now 11,455,468.
Claims priority of provisional application 63/150,271, filed on Feb. 17, 2021.
Prior Publication US 2023/0259709 A1, Aug. 17, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/00 (2019.01); G06F 40/106 (2020.01); G06F 40/295 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2023.01); G06T 11/60 (2006.01)
CPC G06F 40/295 (2020.01) [G06F 40/106 (2020.01); G06F 40/30 (2020.01); G06N 3/08 (2013.01); G06T 11/60 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A system comprising:
one or more processors of a machine; and
at least one memory storing instructions that, when executed by the one or more processors, cause the machine to perform operations comprising:
providing access to a cloud data platform including a machine learning model for performing a plurality of iterations to generate a Natural Language Processing (NLP) model, each iteration comprising:
receiving real-world documents;
enabling information retrieval from the real-world documents without annotated training data;
receiving data comprising text data, layout data, and image data;
analyzing the text data, the layout data, and the image data; and
generating one or more outputs from the machine learning model by applying the plurality of iterations on new data, based at least in part on the analyzing of the text data, the layout data, and the image data.