| CPC G06V 30/274 (2022.01) [G06V 10/82 (2022.01); G06V 30/347 (2022.01)] | 9 Claims |

|
1. A syntax-directed mathematical expression recognition system, comprising:
an online handwritten mathematical expression unit for receiving online handwritten mathematical expressions which includes strokes of mathematical symbols and related texts of the mathematical expressions so as to obtain coordinates of the strokes and program codes corresponding to the online handwritten mathematical expression; the coordinates of the strokes and the program codes being formed as online handwritten samples, the online handwritten mathematical expression unit including an online database for storing a great number of online handwritten samples usable in training of a neural network;
an offline handwritten and printed mathematical expression unit for receiving offline handwritten and printed mathematical expressions with preset program codes, the offline handwritten and printed mathematical expressions and related program codes being formed an offline handwritten and printed mathematical expression samples which are stored in an offline database;
a structured mathematical expression generator for generating structured mathematical expressions; the mathematical expression generator including a corpus and a mathematical grammar database; the structured mathematical expression generator collecting a large amount of program codes of mathematical expressions which are then stored in the corpus; the mathematical grammar database storing grammars of specific mathematical program languages, the structured mathematical expression generator generating a large amount of structured mathematical expressions by using the program codes of the mathematical expressions in the corpus based on the grammars in the mathematical grammar database; and
a handwriting sample generator connected to the online handwritten mathematical expression unit, the offline handwritten and printed mathematical expression unit, and the structured mathematical expression generator; the handwriting sample generator serving to generate a huge amount of mathematical expression handwriting samples for the mathematical expressions in the online handwritten mathematical expression unit, the offline handwritten and printed mathematical expression unit and the structured mathematical expression generator for being used in training of the neural network; and
a mathematical expression recognition neural network including an input interface and an output interface; the input interface including a plurality of input terminals and the output interface includes a plurality of output terminals, the input interface being connected to the handwriting sample generator for receiving the mathematical expression handwriting samples; the output interface serving to receive the program codes corresponding to the mathematical expression handwriting samples inputted to the input interface; and in prediction stage, coordinates of strokes of a mathematical expression are inputted to the input interface of the mathematical recognition neural network for recognition; the output interface of the neural network outputs at least one program code which is corresponding to the input mathematical expression;
wherein the ways for generating mathematical expression handwriting sample are that:
(a) the coordinates of the strokes and the program codes of the online handwritten mathematical expressions in the online handwritten samples are directly formed as corresponding mathematical expression handwriting samples;
(b) obtaining the strokes of the offline handwritten and printed mathematical expressions in the offline handwritten and printed mathematical expression samples so as to acquire coordinates of the strokes; these coordinates and the preset program codes for the offline handwritten and printed mathematical expressions are formed as corresponding mathematical expression handwriting samples;
(c) acquiring mathematical expression handwriting samples for the structured mathematical expressions in the structured mathematical expression generator by the ways that: (1) the structured mathematical expressions are converted into or rendered as printed form mathematical expressions, and for each mathematical symbol in the printed form mathematical expression, a minimum rectangular frame just containing a corresponding mathematical symbol is acquired; (2) the printed form mathematical expressions are divided into several sub-mathematical expressions which can find identical mathematical expressions in the online handwritten mathematical expression samples; if the sub-mathematical expressions are not found mathematical expressions in the online handwritten mathematical expression samples, then these sub-mathematical expressions are not used in the following process; the minimum rectangular frame for each mathematical symbol is adjusted; (3) to find coordinates of strokes in the online handwritten mathematical expression samples which correspond to the strokes of each symbol in the sub-mathematical expressions; then affine transformation is performed to above found coordinates of the strokes to acquire transformed coordinates of strokes of each symbol and the program codes of the structured mathematical expression which are combined as a mathematical expression handwriting sample; (4) The mathematical expression handwriting sample is rotated randomly or size-changed randomly so as to generate more and more mathematical expression handwriting samples; and
wherein the mathematical recognition neural network includes an encoding neural network which includes the input interface; and a decoding neural network which includes the output interface; the encoding neural network serves to find features of the input mathematical expression and then these features are inputted to the decoding neural network; the output of the decoding neural network includes parts of program codes of the input mathematical expression which are indicated with statistical reliability; these parts of the program codes and the reliabilities are auto-regressed back to the input of the decoding neural network for recognition again so as to acquire another parts of the program code and reliabilities; these processes are performed again and again until a preset reliabilities is obtained or a preset times is achieved for auto-regression; the resulted final program codes and reliabilities are stored in a candidate unit.
|