US 11,893,344 B2
Morpheme analysis learning device, morpheme analysis device, method, and program
Jun Suzuki, Tokyo (JP)
Assigned to NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
Appl. No. 16/970,890
Filed by NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Tokyo (JP)
PCT Filed Feb. 19, 2019, PCT No. PCT/JP2019/006046
§ 371(c)(1), (2) Date Aug. 18, 2020,
PCT Pub. No. WO2019/163752, PCT Pub. Date Aug. 29, 2019.
Claims priority of application No. 2018-028303 (JP), filed on Feb. 20, 2018.
Prior Publication US 2020/0380207 A1, Dec. 3, 2020
Int. Cl. G06F 40/268 (2020.01); G06N 20/00 (2019.01); G06F 17/18 (2006.01); G06N 3/04 (2023.01); G06F 40/20 (2020.01)
CPC G06F 40/268 (2020.01) [G06F 17/18 (2013.01); G06F 40/20 (2020.01); G06N 3/04 (2013.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
8. A system for machine learning, the system comprises:
a processor; and
a memory storing computer-executable instructions that when executed by the processor cause the system to:
receive learning data, wherein the learning data includes a pair of input text and a morpheme label string as a morphological analysis result of the input text for learning, and wherein the morpheme label string includes a series of morpheme labels;
generate a series of characters based on the received input text;
generate a series of character vectors, wherein each character vector of the series of character vectors is a vector form of each character of the series of characters;
generate morpheme information for each of character sub strings of the text input, wherein each of the character strings comprises one or more characters of the series of characters, and wherein each of the morpheme information matches each of the character substring based on a morpheme dictionary;
generate a morpheme vector for each of the character substrings, the morpheme vector representing the generated morpheme information in a vector form;
determine, based on a probability value, whether to generate a plurality of resultant vectors based on a combination of one character vector of the series of character vectors and the morpheme information vector or to generate the plurality of resultant vectors without the morpheme information vector;
generate, based on the determination of generating the plurality of resultant vectors, a plurality of resultant vectors, each of the plurality of resultant vectors is based on either one of:
the combination of the character vector of the series of character vectors and the morpheme information vector corresponding to each character, or
the one character vector of the series of character corresponding to each character without combining the morpheme information vector;
specify a morpheme label to each of the plurality of resultant vectors; and
for each character in sequence in the input text, recursively train a parameter learning model for predicting a series of morpheme labels of the input text using the specified morpheme label to each of the plurality of resultant vectors.