US 11,875,816 B2
Electronic device and method for controlling thereof
Kangwook Lee, Suwon-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Appl. No. 17/422,020
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
PCT Filed Dec. 23, 2020, PCT No. PCT/KR2020/018985
§ 371(c)(1), (2) Date Jul. 9, 2021,
PCT Pub. No. WO2022/092440, PCT Pub. Date May 5, 2022.
Claims priority of application No. 10-2020-0139595 (KR), filed on Oct. 26, 2020.
Prior Publication US 2022/0343939 A1, Oct. 27, 2022
Int. Cl. G10L 25/30 (2013.01); G10L 13/08 (2013.01); G06N 3/02 (2006.01); G06F 40/284 (2020.01)
CPC G10L 25/30 (2013.01) [G06F 40/284 (2020.01); G06N 3/02 (2013.01); G10L 13/08 (2013.01)] 13 Claims
OG exemplary drawing
 
1. An electronic device comprising:
a memory storing a neural network model; and
a processor configured to input, to the neural network model, input data to obtain output data,
wherein, based on comparison between first output data based on input first modality and second output data based on input second modality, in response to the second modality being input, to output the first modality corresponding to the first output data based on the neural network model,
wherein the second modality comprises at least one masking element,
wherein the first modality comprises a first text and the second modality comprises a first audio signal, and
wherein the neural network model is configured to:
output a second audio signal corresponding to the first text and a second text corresponding to the first audio signal with the first text composed of a plurality of tokenized text elements and the first audio signal in which at least one of segmented plurality of elements is masked as input data, and
based on a first audio signal comprising the at least one masking element being input based on the comparison of the second audio signal and the second text, output a first text corresponding to the second audio signal.