US 11,715,302 B2
Automatic tagging of images using speech recognition
Ryan R. Fink, Vancouver, WA (US); and Sean M. Adkinson, North Plains, OR (US)
Assigned to STREEM, LLC, Portland, OR (US)
Filed by STREEM, INC., Portland, OR (US)
Filed on Aug. 21, 2019, as Appl. No. 16/547,391.
Claims priority of provisional application 62/745,092, filed on Oct. 12, 2018.
Claims priority of provisional application 62/720,741, filed on Aug. 21, 2018.
Prior Publication US 2020/0065589 A1, Feb. 27, 2020
Int. Cl. G06V 20/40 (2022.01); G10L 25/78 (2013.01); G10L 15/22 (2006.01)
CPC G06V 20/41 (2022.01) [G10L 15/22 (2013.01); G10L 25/78 (2013.01)] 16 Claims
OG exemplary drawing
 
1. A method, comprising:
performing automatic speech recognition upon an audio signal captured contemporaneously with one or more images to obtain one or more candidate words;
performing object detection upon the one or more images to obtain one or more detected objects;
determining an image context, wherein the image context includes the detected objects;
determining whether the one or more images includes one or more augmented reality objects;
and
if there are no augmented reality objects in the one or more images:
selecting a first set of candidate words, based on the image context, from the obtained one or more candidate words, and
tagging each detected object with at least one of the first set of candidate words; or
if there is at least one augmented reality object in the one or more images:
selecting a second set of candidate words, based on the image context, from the obtained one or more candidate words, the second set of candidate words having at least one candidate word that is not in the first set of one or more candidate words, and
tagging each detected object with at least one of the second set of candidate words.