US 11,797,590 B2
Generating structured data for rich experiences from unstructured data streams
Pranathi R. Tupakula, Seattle, WA (US); Aman Singhal, Bellevue, WA (US); Prithvishankar Srinivasan, Seattle, WA (US); and Marcelo M. Debarros, Redmond, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jan. 5, 2021, as Appl. No. 17/141,634.
Claims priority of provisional application 63/073,791, filed on Sep. 2, 2020.
Prior Publication US 2022/0067077 A1, Mar. 3, 2022
Int. Cl. G06N 20/00 (2019.01); G06F 16/951 (2019.01); G06F 16/34 (2019.01); G06F 40/295 (2020.01); G06F 3/0482 (2013.01); G06F 18/214 (2023.01)
CPC G06F 16/345 (2019.01) [G06F 3/0482 (2013.01); G06F 16/951 (2019.01); G06F 18/214 (2023.01); G06F 40/295 (2020.01); G06N 20/00 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method for generating structured content for an event, comprising:
obtaining a plurality of information items from a plurality of data sources, each information item including unstructured content about the event;
providing the plurality of information items to a trained machine learning model, wherein the model is trained with training data that includes information items and corresponding labeled entities for a plurality of historical events;
receiving a formatted request, wherein the formatted request is associated with one or more labeled entities associated with the trained machine learning model;
identifying, by the trained machine learning model, multiple entities from the unstructured content based on the formatted request associated with the one or more labeled entities;
combining the plurality of information items associated with an identified entity of the multiple identified entities into a summary information item;
splitting the summary information item into a plurality of text segments;
assembling one or more text segments of the plurality of text segments to generate a summary of the plurality of information items associated with the identified entity;
storing each identified entity of the identified multiple entities and the generated summary as structured content responsive to the formatted request when a number of matching identified entities for each identified entity of the identified multiple entities exceeds a threshold; and
including the structured content in a search index.