US 11,934,793 B2
System and method for content comprehension and response
Ajay Divakaran, Monmouth Junction, NJ (US); Karan Sikka, Lawrenceville, NJ (US); Yi Yao, Princeton, NJ (US); Yunye Gong, West Windsor, NJ (US); Stephanie Nunn, Hopkins, MN (US); Pritish Sahu, Piscataway, NJ (US); Michael A. Cogswell, West Windsor, NJ (US); Jesse Hostetler, Boulder, CO (US); and Sara Rutherford-Quach, San Carlos, CA (US)
Assigned to SRI International, Menlo Park, CA (US)
Filed by SRI International, Menlo Park, CA (US)
Filed on Nov. 1, 2021, as Appl. No. 17/516,409.
Claims priority of provisional application 63/109,282, filed on Nov. 3, 2020.
Prior Publication US 2022/0138433 A1, May 5, 2022
Int. Cl. G06F 40/35 (2020.01); G06F 16/33 (2019.01); G06N 5/04 (2023.01)
CPC G06F 40/35 (2020.01) [G06F 16/3335 (2019.01); G06N 5/04 (2013.01)] 20 Claims
OG exemplary drawing
 
10. A method for content comprehension and response of content, comprising:
receiving a question directed to the content;
determining a question vector representation of the received question;
projecting the determined question vector representation into a trained common embedding space in which question vector representations and respective content vector representations that are related, are closer together in the common embedding space than unrelated question vector representations and content vector representations;
determining a distance measure between the determined question vector representations projected into the common embedding space and respective embedded question answer pair vector representations in the common embedding space using a distance function to identify content related to the received question;
wherein the common embedding space is trained by:
selecting a hierarchical taxonomy having at least two layers including respective words resulting in layers of varying complexity; and
for each layer of the hierarchical taxonomy:
determining a set of words associated with a layer of the hierarchical taxonomy;
determining a question answer pair based on a question generated using at least one word of the set of words and at least one content domain to which the question is applied;
determining a vector representation for the generated question answer pair and for content related to the at least one content domain of the question answer pair; and
embedding the vector representation determined for the generated question answer pair and the vector representations generated for the content related to the content domain into a common embedding space such that embedded vector representations for generated question answer pair and embedded vector representations for content related to the content domain that are related, are closer together in the common embedding space than unrelated embedded vector representations;
wherein the common embedding space comprises embedded question answer pairs for each of the at least two layers of the hierarchical taxonomy, such that a relationship between embedded question answer pairs of varying complexity can be determined.