US 12,225,317 B2
	Front-end clipping using visual cues
Joseph Sayer, Bury St Edmunds (GB); Andrew David Lyell, Winchester (GB); and Benjamin David Cox, Newbury (GB)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Mar. 3, 2022, as Appl. No. 17/653,287.
Prior Publication US 2023/0283740 A1, Sep. 7, 2023
Int. Cl. H04N 5/067 (2006.01); G10L 25/57 (2013.01); H04N 7/04 (2006.01)

CPC H04N 5/067 (2013.01) [G10L 25/57 (2013.01); H04N 7/04 (2013.01)]

20 Claims

1. A processor-implemented method, the method comprising:

capturing input, including at least one visual input and at least one audio input, to a first device;

training a machine learning model to recognize a visual cue indicative of a user desire to speak and predict a volume at which a user will speak based on a visual input from the at least one visual input and an audio input, synchronized to the visual input, from the at least one audio input;

marking one or more timestamps which are determined, using the model, to correspond to speech in the at least one audio input; and

transmitting an audio input from within the at least one audio input corresponding to the one or more marked timestamps from the first device to a second device.