US 11,886,813 B2
Efficient automatic punctuation with robust inference
Maury Courtland, McLean, VA (US); Adam Faulkner, McLean, VA (US); and Gayle McElvain, McLean, VA (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Sep. 24, 2020, as Appl. No. 17/030,827.
Claims priority of provisional application 63/009,391, filed on Apr. 13, 2020.
Prior Publication US 2021/0319176 A1, Oct. 14, 2021
Int. Cl. G06F 40/253 (2020.01); G06F 40/30 (2020.01); G06F 40/284 (2020.01); G10L 15/22 (2006.01); G10L 15/02 (2006.01)
CPC G06F 40/253 (2020.01) [G06F 40/284 (2020.01); G06F 40/30 (2020.01); G10L 15/02 (2013.01); G10L 15/22 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method for automatically punctuating text, the method comprising:
(a) applying, by one or more computing devices, a text string to a first component of a non-recurrent neural network trained to generate one or more contextualized vectors representing a contextualized meaning for each word in the text string, and wherein the first component determines the contextualized vectors by processing each word in the text string in parallel with one another;
(b) generating, by the first component, the one or more contextualized vectors;
(c) applying, by the one or more computing devices, an output of the contextualized vectors by the first component to a second component of the non-recurrent neural network trained to generate a set of probability values for each word in the text string, wherein the set of probability values indicates a likelihood that a punctuation mark exists for each word in the text string, and wherein the second component determines the set of probability values by processing the contextualized vectors in parallel with one another;
(d) generating, by a first linear layer of the second component, a first set of vectors representing a compressed representation for each of the contextualized vectors output by the first component, wherein the first set of vectors is generated based on multiplying each of the contextualized vectors output by the first component with a first set of tunable parameters trained to predict the punctuation mark;
(e) concatenating, after the first linear layer, the first set of vectors into a further vector;
(f) generating, by the second component, the set of probability values;
(g) transmitting, by the one or more computing devices, the set of probability values to a text generation engine to generate a formatted text string based on the set of probability values, wherein the formatted text string includes the punctuation mark for each word of the text string, wherein the applying the text string to the first component, applying the contextualized vectors output, and the transmitting occur in real-time; and
(h) generating, by the text generation engine, the formatted text string.