US 12,206,955 B2
	Multimedia data generating method, apparatus, electronic device, medium, and program product
Jiajin Cao, Beijing (CN)
Assigned to BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD., Beijing (CN)
Filed by Beijing Zitiao Network Technology Co., Ltd., Beijing (CN)
Filed on Dec. 13, 2023, as Appl. No. 18/538,703.
Application 18/538,703 is a continuation of application No. PCT/CN2022/127840, filed on Oct. 27, 2022.
Claims priority of application No. 202111266196.5 (CN), filed on Oct. 28, 2021.
Prior Publication US 2024/0114215 A1, Apr. 4, 2024
Int. Cl. H04N 21/85 (2011.01); G10L 13/02 (2013.01); H04N 21/488 (2011.01); H04N 21/81 (2011.01); H04N 21/84 (2011.01)

CPC H04N 21/8106 (2013.01) [G10L 13/02 (2013.01); H04N 21/4884 (2013.01)]

18 Claims

1. A multimedia data generating method, comprising:

receiving text information inputted by a user;

displaying, in response to a recording trigger operation for the text information, the text information and acquiring a first reading speech of the text information;

generating a first multimedia data based on the text information and the first reading speech and displaying the first multimedia data; and

marking, in a case of detecting that a match rate between a first target speech segment and a first target text segment is lower than a match rate threshold while acquiring the first reading speech, the first target speech segment, and the first target text segment,

wherein the first multimedia data comprise the first reading speech and a video image matched with the text information, the first multimedia data comprise a plurality of first multimedia segments, the plurality of first multimedia segments corresponding to a plurality of text segments included in the text information, respectively; wherein a first target multimedia segment comprises a first target video segment and the first target speech segment, the first target multimedia segment referring to a first multimedia segment in the plurality of first multimedia segments corresponding to the first target text segment in the plurality of text segments, the first target video segment including a video image matched with the first target text segment, the first target speech segment including a reading speech of the first target text segment.