US 11,778,277 B1
Digital item processing for video streams
Tony Norman Bryan, Lake Forest Park, WA (US); Paul Martin, Seattle, WA (US); Brent Allen Colson, Seattle, WA (US); and Samuel Leiber, Fairfax, VA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Jun. 11, 2021, as Appl. No. 17/345,264.
Int. Cl. H04N 5/445 (2011.01); G06F 13/00 (2006.01); G06F 3/00 (2006.01); H04N 21/478 (2011.01); G06N 20/00 (2019.01); G10L 15/22 (2006.01); G06F 16/783 (2019.01); G06F 16/78 (2019.01); H04N 21/422 (2011.01); H04N 21/44 (2011.01)
CPC H04N 21/47815 (2013.01) [G06F 16/785 (2019.01); G06F 16/7837 (2019.01); G06F 16/7854 (2019.01); G06F 16/7867 (2019.01); G06N 20/00 (2019.01); G10L 15/22 (2013.01); H04N 21/42203 (2013.01); H04N 21/44008 (2013.01); G10L 2015/223 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
at least one processor; and
at least one non-transitory computer-readable storage medium having computer-executable instructions stored thereon which, when executed on the at least one processor, cause the system to perform operations comprising:
determining that a frame of a first media stream comprises a first representation of a candidate object;
determining a tag associated with the first media stream and the candidate object, the tag comprising a first stream identifier associated with the first media stream, the tag comprising a timestamp representing a first time in which the first representation is depicted within the first media stream, the first representation of the candidate object being visually depicted within the first media stream;
determining a second stream identifier associated with a second media stream, the second media stream comprising a second representation of the candidate object, the second representation of the candidate object being audibly referenced in the second media stream;
storing, in a database, data associated with the first media stream and the second media stream, the data comprising the first stream identifier, the timestamp, an object identifier, and the second stream identifier, the object identifier being associated with the candidate object;
determining supplemental information associated with the candidate object, based at least in part on the candidate object;
storing, in the database, the supplemental information;
receiving a first request to output the first media stream via a display device;
receiving, while the first media stream is being output by the display device, and from a voice-controlled device within a same environment as the display device, a second request associated with the first media stream, the second request being a natural language request captured by one or more microphones of the voice-controlled device;
processing the natural language request to be a natural language processed request referring to an item that is depicted in the first media stream;
comparing a second time in which the second request was received and a corresponding time in the first media stream to the first time;
determining that the second request is associated with the first representation of the candidate object, based at least in part on a result of the comparing the second time and the corresponding time to the first time; and
causing at least one of the object identifier or the supplemental information to be visually displayed via the display device.