US 12,444,421 B2
	Computer-implemented method for punctuation of text from audio input
Ville Ruutu, Helsinki (FI); Jussi Ruutu, Helsinki (FI); and Honain Derrar, Helsinki (FI)
Assigned to Elisa Oyj, Helsinki (FI)
Appl. No. 18/842,368
Filed by Elisa Oyj, Helsinki (FI)
PCT Filed Apr. 14, 2023, PCT No. PCT/FI2023/050208 § 371(c)(1), (2) Date Aug. 28, 2024, PCT Pub. No. WO2023/209274, PCT Pub. Date Nov. 2, 2023.
Claims priority of application No. 20225351 (FI), filed on Apr. 27, 2022.
Prior Publication US 2025/0111854 A1, Apr. 3, 2025
Int. Cl. G10L 15/05 (2013.01); G10L 15/08 (2006.01); G10L 15/14 (2006.01); G10L 15/26 (2006.01); G10L 25/93 (2013.01); G10L 15/02 (2006.01); G10L 15/04 (2013.01); G10L 25/03 (2013.01); G10L 25/18 (2013.01); G10L 25/78 (2013.01); G10L 25/87 (2013.01)

CPC G10L 15/26 (2013.01) [G10L 15/05 (2013.01); G10L 15/08 (2013.01); G10L 15/14 (2013.01); G10L 25/93 (2013.01); G10L 15/02 (2013.01); G10L 15/04 (2013.01); G10L 25/03 (2013.01); G10L 25/18 (2013.01); G10L 25/78 (2013.01); G10L 25/87 (2013.01)]

11 Claims

1. A computer-implemented method for punctuation of text from an audio input, the method comprising:

obtaining an audio input comprising speech data;

identifying a plurality of silent sections in the audio input;

obtaining a type input indicating a type of the speech data in the audio input, wherein the type input indicates that the speech data is a customer service call, a public speech, or a lecture;

choosing an expected distribution, indicating an expected relative frequency of each group in a plurality of groups, according to the type input;

grouping the plurality of silent sections into the plurality of groups, wherein each group in the plurality of groups corresponds to a punctuation mark or a space without a punctuation mark, wherein the grouping the plurality of silent sections into the plurality of groups is done at least partially by using the expected distribution; and

associating each silent section in the plurality of silent sections with a punctuation mark or a space according to the group of the silent section, thus obtaining punctuation information,

performing a speech-to-text conversion on the audio input, thus obtaining a transcript of the speech data; and

punctuating the transcript according to the punctuation information by associating each silent section in the plurality of silent sections with the corresponding punctuation mark or the corresponding space without the punctuation mark;

wherein the expected distribution is based on statistical information about the plurality of silent sections between spoken words to apply punctuation.