| CPC G06F 40/30 (2020.01) [G06F 40/166 (2020.01); G06F 40/279 (2020.01); G06F 40/58 (2020.01); G06N 5/022 (2013.01); H04N 5/278 (2013.01)] | 15 Claims |

|
1. A method of training a semantic unit detection model, comprising:
acquiring a sample language sequence containing a plurality of elements, wherein each element of the plurality of elements has an original label indicating whether an element unit composed of that element and at least one element preceding that element is a semantic unit with semantics; and
training the semantic unit detection model using the sample language sequence and the original label for each element in the sample language sequence;
wherein the sample language sequence is a sample text sequence, each element in the sample language sequence is each word in the sample text sequence, and a length of the sample text sequence is L, wherein L is an integer greater than or equal to 1, the method further comprising:
acquiring a target text sequence corresponding to the sample text sequence, wherein the target text sequence is obtained by translating the sample text sequence;
translating first i word in the sample text sequence into an initial target language segment, wherein i is an integer greater than or equal to 1 and less than or equal to L;
comparing the initial target language segment for the first i word in the sample text sequence with the target text sequence; and
determining an original label for an ith word in the sample text sequence according to a comparison result;
wherein the determining an original label for an ith word in the sample text sequence according to a comparison result comprises:
determining the original label for the ith word in the sample text sequence as a positive sample to indicate that a text unit composed of the first i word in the sample text sequence is a semantic unit with semantics, in response to the initial target language segment for the first i word in the sample text sequence being the same as a target language segment, in the target text sequence, corresponding to the first i word; and
determining the original label for the ith word in the sample text sequence as a negative sample to indicate that a text unit composed of the first i word in the sample text sequence is not a semantic unit with semantics, in response to the initial target language segment for the first i word in the sample text sequence being different from the target language segment, in the target text sequence, corresponding to the first i word.
|