US 11,734,268 B2
Document pre-processing for question-and-answer searching
David Nahamoo, Great Neck, NY (US); Igor Roditis Jablokov, Raleigh, NC (US); Vaibhava Goel, Chappaqua, NY (US); Etienne Marcheret, White Plains, NY (US); Ellen Eide Kislal, Wellesley, MA (US); Steven John Rennie, Yorktown Heights, NY (US); Marie Wenzel Meteer, Arlington, MA (US); Neil Rohit Mallinar, Astoria, NY (US); Soonthorn Ativanichayaphong, New York, NY (US); Joseph Allen Pruitt, Sammamish, WA (US); John Pruitt, Seattle, WA (US); Bryan Dempsey, Raleigh, NC (US); and Chui Sung, Fort Lee, NJ (US)
Assigned to Pryon Incorporated, Raleigh, NC (US)
Filed by Pryon Incorporated, Raleigh, NC (US)
Filed on Jun. 25, 2021, as Appl. No. 17/358,114.
Claims priority of provisional application 63/043,906, filed on Jun. 25, 2020.
Prior Publication US 2021/0406264 A1, Dec. 30, 2021
Int. Cl. G06F 16/2452 (2019.01); G06F 40/151 (2020.01); G06F 40/137 (2020.01); G06F 16/957 (2019.01); G06F 16/332 (2019.01); G06F 16/338 (2019.01); G06F 16/335 (2019.01); G06F 16/9032 (2019.01); G06F 16/93 (2019.01); G06F 40/30 (2020.01); G06F 16/33 (2019.01); G06F 40/247 (2020.01); G06N 5/022 (2023.01); G06N 5/04 (2023.01); G06F 40/131 (2020.01); G06F 40/20 (2020.01); G06F 40/284 (2020.01); G06N 3/08 (2023.01); G06N 3/006 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)
CPC G06F 16/24522 (2019.01) [G06F 16/335 (2019.01); G06F 16/338 (2019.01); G06F 16/3328 (2019.01); G06F 16/3329 (2019.01); G06F 16/3349 (2019.01); G06F 16/9032 (2019.01); G06F 16/93 (2019.01); G06F 16/9574 (2019.01); G06F 40/131 (2020.01); G06F 40/137 (2020.01); G06F 40/151 (2020.01); G06F 40/20 (2020.01); G06F 40/247 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06N 5/022 (2013.01); G06N 5/04 (2013.01); G06N 3/006 (2013.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/08 (2013.01)] 23 Claims
OG exemplary drawing
 
1. A method comprising:
receiving a source document;
applying one or more pre-processes to the source document to produce document contextual information representative of a structure and content of the source document; and
transforming the source document, based on the document contextual information, to generate a question-and-answer searchable document;
wherein applying the one or more pre-processes comprises segmenting the source document into multiple document segments, wherein transforming the source document comprises transforming according to a vector transform applied to the document contextual information and content of the multiple document segments with a first contextual data element derived from a first segment of the multiple document segments being combined with content of a different segment of the multiple document segments, and wherein the method further comprises:
for at least one segment of the multiple document segments, identifying at least one segment descriptor comprising one or more of: at least one entity associated with the at least one segment, at least one task associated with at least one segment, or subject matter descriptor associated with the at least one segment;
tagging the at least one segment with the at least one descriptor;
receiving query data representative of a question from a user relating to the content of the source document;
determining sensor contextual information based on sensor data obtained by a sensor device associated with the user; and
searching a response to the query data from one or more of the multiple document segments with segment descriptors matching the determined sensor contextual information;
wherein determining the contextual information comprises:
capturing sensor data by one or more sensors of an augmented reality system with which the user is interacting; and
determining an item or location identifiable from the sensor data being presented to the user through the augmented reality system;
wherein searching the response to the query data comprises searching the response to the query data from one or more of the multiple document segments based, at least in part, on the determined item or location identifiable from the sensor data.