US 11,675,977 B2
	Intelligent system that dynamically improves its knowledge and code-base for natural language understanding
Robert J. Munro, San Francisco, CA (US); Rob Voigt, Palo Alto, CA (US); Schuyler D. Erle, San Francisco, CA (US); Brendan D. Callahan, Philadelphia, PA (US); Gary C. King, Los Altos, CA (US); Jessica D. Long, San Francisco, CA (US); Jason Brenier, Oakland, CA (US); Tripti Saxena, Cupertino, CA (US); and Stefan Krawczyk, Menlo Park, CA (US)
Assigned to Daash Intelligence, Inc., Miami, FL (US)
Filed by Daash Intelligence, Inc., Miami, FL (US)
Filed on Mar. 27, 2020, as Appl. No. 16/832,632.
Application 16/832,632 is a continuation of application No. 16/056,263, filed on Aug. 6, 2018, abandoned.
Application 16/056,263 is a continuation of application No. 15/596,855, filed on May 16, 2017, abandoned.
Application 15/596,855 is a continuation of application No. 14/964,512, filed on Dec. 9, 2015, granted, now 9,965,458, issued on May 8, 2018.
Claims priority of provisional application 62/254,090, filed on Nov. 11, 2015.
Claims priority of provisional application 62/254,095, filed on Nov. 11, 2015.
Claims priority of provisional application 62/089,736, filed on Dec. 9, 2014.
Claims priority of provisional application 62/089,747, filed on Dec. 9, 2014.
Claims priority of provisional application 62/089,745, filed on Dec. 9, 2014.
Claims priority of provisional application 62/089,742, filed on Dec. 9, 2014.
Prior Publication US 2021/0157984 A1, May 27, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 40/289 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01); G06F 40/216 (2020.01)

CPC G06F 40/284 (2020.01) [G06F 40/216 (2020.01); G06F 40/30 (2020.01)]

20 Claims

1. A method for tokenizing text for natural language processing, the method comprising:

generating, by one or more processors in a natural language processing platform, and from a pool of documents, a set of statistical models comprising one or more entries each indicating a likelihood of appearance of a character/letter sequence in the pool of documents;

receiving, by the one or more processors, a set of rules comprising rules that identify character/letter sequences as valid tokens;

transforming, by the one or more processors, one or more entries in the statistical models into new rules that are added to the set of rules when the entries indicate a high likelihood;

receiving, by the one or more processors, a document to be processed;

dividing, by the one or more processors, the document to be processed into tokens based on the set of statistical models and the set of rules; and

outputting, by the one or more processors, the tokens for natural language processing.