| CPC G06F 40/279 (2020.01) [G06F 40/166 (2020.01); G06Q 10/063112 (2013.01)] | 20 Claims |

|
1. A system for characterizing natural language text units, the system comprising:
at least one processor programmed to perform operations comprising:
using a plurality of text units to train a bidirectional model to generate context vectors, the plurality of text units indicating job descriptions;
accessing a set of annotated text units from a corpus of text units describing job descriptions, a first annotated text unit of the set of annotated text units comprising:
a first span comprising a first set of ordered words from the first annotated text unit; and
first annotation data describing a job skill associated with the first span;
applying the bidirectional model to the set of annotated text units to generate a plurality of span context vectors;
using the plurality of span context vectors generated with the bidirectional model to train a span prediction model, the span prediction model comprising a first probability function configured to provide a probability that a span prediction model input indicates a first job skill and a second probability function to provide a probability that the span prediction model input indicates a second job skill; and
applying the span prediction model to at least a portion of the plurality of text units to generate a plurality of span characterizations, a first span characterization corresponding to a first span indicating that the first span describes the first job skill.
|