US 12,230,049 B2
Multi-segment text search using machine learning model for text similarity
Bryant Lee, San Francisco, CA (US); Andrew Tjang, Moraga, CA (US); Andrew Perry Chu, San Francisco, CA (US); and Uday Pulleti, Hyderabad (IN)
Assigned to Cognition IP Technology Inc., San Francisco, CA (US)
Filed by Cognition IP Technology Inc., San Francisco, CA (US)
Filed on Jun. 5, 2023, as Appl. No. 18/205,867.
Application 18/205,867 is a continuation of application No. 17/723,449, filed on Apr. 18, 2022, granted, now 11,670,103.
Application 17/723,449 is a continuation of application No. 16/718,081, filed on Dec. 17, 2019, granted, now 11,308,320, issued on Apr. 19, 2022.
Claims priority of provisional application 62/787,640, filed on Jan. 2, 2019.
Claims priority of provisional application 62/780,582, filed on Dec. 17, 2018.
Prior Publication US 2023/0394863 A1, Dec. 7, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/00 (2019.01); G06F 40/226 (2020.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01); G06V 30/418 (2022.01); G06F 16/93 (2019.01)
CPC G06V 30/414 (2022.01) [G06F 40/226 (2020.01); G06V 30/413 (2022.01); G06V 30/418 (2022.01); G06F 16/93 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method for performing text search using machine learning, comprising:
receiving an input text block, the input text block comprising a patent claim;
splitting the patent claim into clauses;
performing, by a first machine learning model, a document similarity matching between each the input text block and a plurality of reference documents;
identifying, based on the document similarity matching, a first subset of documents;
performing, by a second machine learning model, a text similarity matching between each clause and a plurality of stored text portions for each document in the identified subset of document, wherein the text similarity matching generates a plurality of similarity scores measuring the similarity between each clause and the plurality of stored text portions;
generating, for each clause, a clause subset of documents from the first subset of documents;
generating, for each clause, a ranked list of text segments from the clause subset, wherein the ranking is based on text similarity between the clause and the stored text portions of the documents in the clause subset;
identifying, based at least partly on the ranked list of text segments for each clause and one or more combination criteria, a combination of final documents and text segments of said final documents; and
displaying, for each clause, one or more text segments from one or more final documents based at least partly on the ranking of the text segments and the one or more combination criteria.