US 12,267,556 B2
	Device for detecting music data from video contents, and method for controlling same
Ilyoung Jeong, Seoul (KR); Hyungui Lim, Seoul (KR); Yoonchang Han, Seoul (KR); Subin Lee, Seoul (KR); Jeongsoo Park, Yongin-si (KR); and Donmoon Lee, Suwon-si (KR)
Assigned to Cochl Inc, Dover, DE (US)
Appl. No. 17/925,682
Filed by Cochl Inc, Dover, DE (US)
PCT Filed May 18, 2021, PCT No. PCT/KR2021/006244 § 371(c)(1), (2) Date Nov. 16, 2022, PCT Pub. No. WO2021/235846, PCT Pub. Date Nov. 25, 2021.
Claims priority of application No. 10-2020-0059429 (KR), filed on May 19, 2020.
Prior Publication US 2023/0217074 A1, Jul. 6, 2023
Int. Cl. H04N 21/462 (2011.01); G10L 15/06 (2013.01); G10L 21/028 (2013.01); G10L 25/57 (2013.01); G10L 25/81 (2013.01); H04N 21/439 (2011.01); H04N 21/4627 (2011.01)

CPC H04N 21/4627 (2013.01) [G10L 15/063 (2013.01); G10L 21/028 (2013.01); G10L 25/57 (2013.01); G10L 25/81 (2013.01); H04N 21/439 (2013.01)]

3 Claims

1. A data processing method comprising:

receiving video content including a video stream and an audio stream;

detecting music data from the audio stream; and

filtering the audio stream to remove the music data detected from the audio stream,

wherein the detecting of the music data from the audio stream comprises a division operation of dividing the audio stream into music data and voice data and a detection operation of detecting a section in which the music data exists from the audio stream,

wherein the division operation is performed by a first artificial intelligence (AI) model which is trained in advance,

wherein the first AI model, which is composed of an artificial neural network that performs deep learning or machine learning, is configured to perform learning using training data labeled as music or voice,

wherein the first AI model is configured to output a probability that each preset unit section of the audio stream corresponds to the music data and a probability that each preset unit section of the audio stream corresponds to the voice data,

wherein the detection operation is performed by a second artificial intelligence (AI) model which is trained in advance, and

wherein the second AI model is configured to perform learning using training data identified in advance as including music or not.