CPC G06F 40/284 (2020.01) [G06F 40/169 (2020.01); G06N 20/00 (2019.01)] | 15 Claims |
1. A method of identifying a speaker in a text-based work, executable by a processor, comprising:
extracting labeled instances and unlabeled instances corresponding to one or more speakers;
inferring pseudo-labels for the extracted unlabeled instances based on the labeled instances; and
labeling one or more of the unlabeled instances based on the inferred pseudo-labels,
wherein the labeled and unlabeled instances correspond to a class token, tokens in a first piece of text containing an utterance, a separator token, and tokens in a second piece of text that covers the first piece of text,
wherein constructing an input sequence comprises concatenating the class token, the tokens in the first piece of text containing the utterance, the separator token, and the tokens in the second piece of text that covers the first piece of text, and
wherein two vectors correspond to estimated probabilities of each of the tokens being a starting token or an ending token of an answer span that appears in the second piece of text, and the answer span includes a start offset and an end offset.
|