US 11,899,698 B2
Wordbreak algorithm with offset mapping
Manoj Gupta, Hyderabad (IN); and Kavin Motlani, Indore (IN)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Aug. 3, 2021, as Appl. No. 17/444,347.
Claims priority of application No. 202141023933 (IN), filed on May 28, 2021.
Prior Publication US 2022/0382789 A1, Dec. 1, 2022
Int. Cl. G06F 16/31 (2019.01)
CPC G06F 16/313 (2019.01) 20 Claims
OG exemplary drawing
 
1. A computer system, comprising:
a processor coupled to a mass storage device that stores instructions, which, upon execution by the processor, cause the processor to:
store an original string formed of a plurality of characters;
perform a wordbreak algorithm on the original string;
tokenize the original string to generate a processed string including a plurality of word tokens separated by one or more spaces;
generate an offset map between locations within the word tokens in the processed string and corresponding locations in the original string, the offset map including a mapping between a first data structure including character offset index values in the original string and a second data structure including character offset index values in the processed string;
classify a portion of the processed string as a target by determining a start character offset index value and end character offset index value of the target in the processed string from among the character offset index values in the second data structure;
identify target characters in the original string that correspond to the target using the mapping between the character offset values in the first and second data structures in the offset map; and
perform a predetermined action on the target characters in the original string.