US 12,406,135 B2
Assisted review of text content using a machine learning model
Navita Goyal, College Park, MD (US); Ani Nenkova Nenkova, Philadelphia, PA (US); Natwar Modani, Bengaluru (IN); Ayush Maheshwari, Kota (IN); and Inderjeet Jayakumar Nair, Indore (IN)
Assigned to ADOBE INC., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Dec. 13, 2021, as Appl. No. 17/549,270.
Prior Publication US 2023/0186667 A1, Jun. 15, 2023
Int. Cl. G06F 40/205 (2020.01); G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06N 20/00 (2019.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01)
CPC G06F 40/205 (2020.01) [G06F 40/20 (2020.01); G06F 40/279 (2020.01); G06N 20/00 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method for assisted review of a document, the method comprising:
identifying two or more similar reference text segments, from a reference corpus of text content, that are similar to a text segment of the document by:
converting the text segment to a dense vector representation of the text segment after replacing numerical text in the text segment with a corresponding token representing the numerical text;
converting reference text segments from the reference corpus to corresponding dense vector representations of the reference text segments after replacing corresponding numerical text in the reference text segments with corresponding tokens representing the corresponding numerical text;
computing corresponding similarity scores between the dense vector representation of the text segment and the corresponding dense vector representations of the reference text segments using a machine learning model trained using the reference corpus to identify similar text segments; and
subsequent to determining the two or more similar reference text segments based on each corresponding similarity score between the dense vector representation to the corresponding dense vector representations above a threshold level of similarity:
accessing the corresponding numerical text from each of the two or more similar reference text segments; and
determining a computed numerical value from the two or more similar reference text segments based on computing at least one of an average value, a median value, minimum value, or a maximum value of the corresponding numerical text from each of the two or more similar reference text segments; and
providing information for the text segment for display on a user interface, the information including the computed numerical value from the two or more similar reference text segments.