US 12,406,150 B2
Machine learning systems and methods for many-hop fact extraction and claim verification
Yichen Jiang, Seattle, WA (US); Shikha Bordia, Jersey City, NJ (US); Zheng Zhong, Seattle, WA (US); Charles Dognin, Rockville Centre, NY (US); Maneesh Kumar Singh, Princeton, NJ (US); and Mohit Bansal, Carrboro, NC (US)
Assigned to Insurance Services Office, Inc., Jersey City, NJ (US)
Filed by Insurance Services Office, Inc., Jersey City, NJ (US)
Filed on Nov. 24, 2021, as Appl. No. 17/534,899.
Claims priority of provisional application 63/118,074, filed on Nov. 25, 2020.
Prior Publication US 2022/0164546 A1, May 26, 2022
Int. Cl. G06F 40/40 (2020.01); G06F 40/226 (2020.01); G06F 40/295 (2020.01); G06N 5/04 (2023.01); G06F 16/35 (2019.01)
CPC G06F 40/40 (2020.01) [G06F 40/226 (2020.01); G06F 40/295 (2020.01); G06N 5/04 (2013.01); G06F 16/35 (2019.01)] 39 Claims
OG exemplary drawing
 
1. A machine learning system for fact extraction and claim verification, comprising:
a memory; and
a processor in communication with the memory, the processor:
receiving a claim comprising one or more sentences;
retrieving, based at least in part on one or more machine learning models, a document from a dataset, the document having a first relatedness score higher than a first threshold, wherein the first relatedness score indicates that the one or more machine learning models determines that the document is most likely to be relevant to the claim, wherein the dataset comprises a plurality of supporting documents and a plurality of claims, the plurality of claims comprising a first group of claims supported by facts from more than two supporting documents from the plurality of supporting documents and a second group of claims not supported by the plurality of supporting documents;
selecting, based at least in part on the one or more machine learning models, a set of sentences from the document, the set of sentences having second relatedness scores higher than a second threshold, wherein the second relatedness scores indicate that the one or more machine learning models determine that the set of sentences are most likely to be relevant to the claim; and
determining, based at least in part on the one or more machine learning models, whether the claim includes one or more facts from the set of sentences.