US 12,014,548 B2
	Hierarchical segmentation based on voice-activity
Hijung Shin, Arlington, MA (US); Xue Bai, Bellevue, WA (US); Aseem Agarwala, Seattle, WA (US); Joel R. Brandt, Venice, CA (US); Jovan Popović, Seattle, WA (US); Lubomira Dontcheva, Seattle, WA (US); Dingzeyu Li, Seattle, WA (US); Joy Oakyung Kim, Menlo Park, CA (US); and Seth Walker, Oakland, CA (US)
Assigned to Adobe Inc., San Jose, CA (US)
Filed by ADOBE INC., San Jose, CA (US)
Filed on Jun. 2, 2022, as Appl. No. 17/805,075.
Application 17/805,075 is a continuation of application No. 17/017,344, filed on Sep. 10, 2020, granted, now 11,450,112.
Prior Publication US 2022/0292830 A1, Sep. 15, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06V 20/40 (2022.01); G06F 18/231 (2023.01); G10L 25/78 (2013.01); G11B 27/00 (2006.01); G11B 27/10 (2006.01); G11B 27/19 (2006.01); G11B 27/36 (2006.01)

CPC G06V 20/49 (2022.01) [G06F 18/231 (2023.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G10L 25/78 (2013.01); G11B 27/002 (2013.01); G11B 27/19 (2013.01); G06V 20/44 (2022.01)]

20 Claims

1. A method comprising:

generating a representation of a hierarchical segmentation of a video timeline of a video based on adjusting locations of detected speech boundaries using voice-activity detection (VAD) scores of audio of the video to close a non-speech segment between two speech segments; and

providing at least one level of the hierarchical segmentation of the video timeline for presentation.