US 12,353,409 B2
	Methods and systems for improved document processing and information retrieval
Steven John Rennie, Yorktown Heights, NY (US); Marie Wenzel Meteer, Arlington, MA (US); David Nahamoo, Great Neck, NY (US); Dominique O'donnell, Portland, OR (US); Vaibhava Goel, Chappaqua, NY (US); Etienne Marcheret, White Plains, NY (US); Chul Sung, Fort Lee, NJ (US); Igor Roditis Jablokov, Raleigh, NC (US); Soonthorn Ativanichayaphong, New York, NY (US); Ajinkya Jitendra Zadbuke, Cambridge, MA (US); Carmi Rothberg, New York, NY (US); and Ellen Eide Kislal, Leawood, KS (US)
Assigned to Pryon Incorporated, Raleigh, NC (US)
Filed by Pryon Incorporated, Raleigh, NC (US)
Filed on Apr. 19, 2024, as Appl. No. 18/640,448.
Application 18/640,448 is a continuation of application No. PCT/US2023/027320, filed on Jul. 11, 2023.
Claims priority of provisional application 63/423,527, filed on Nov. 8, 2022.
Claims priority of provisional application 63/388,046, filed on Jul. 11, 2022.
Prior Publication US 2024/0265041 A1, Aug. 8, 2024
Int. Cl. G06F 17/00 (2019.01); G06F 16/2452 (2019.01); G06F 16/248 (2019.01); G06F 16/3329 (2025.01); G06F 40/30 (2020.01); G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06V 30/414 (2022.01)

CPC G06F 16/24522 (2019.01) [G06F 16/248 (2019.01); G06F 16/3329 (2019.01); G06F 40/30 (2020.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06V 30/414 (2022.01)]

19 Claims

1. A method for document processing, the method comprising:

obtaining a question dataset that comprises one or more source questions for document processing by a machine-learning question-and-answer system that provides answer data in response to question data submitted by a user,

analyzing the source question to determine a specificity level for the source question, wherein analyzing the source question comprises determining the source question is overly verbose based on a comparison of the determined specificity level to one or more specificity threshold values,

modifying the source question from the question dataset to generate one or more augmented questions that have equivalent semantic meanings as that of the source question,

processing a document with the one or more augmented questions,

wherein modifying the source question comprises: simplifying the source question, in response to a determination that the source question is overly verbose, to exclude one or more semantic elements of the source question to generate a terse question with an equivalent semantic meaning to the source question, and

adding the one or more augmented questions to an augmented question dataset.