US 12,277,615 B2
Detecting and validating improper residency status through data mining, natural language processing, and machine learning
Gregory Rose, San Diego, CA (US); Jessica Flanagan, Sydney (AU); and Anthony Moriarty, Sydney (AU)
Assigned to Deckard Technologies, Inc., La Jolla, CA (US)
Filed by Deckard Technologies, Inc., La Jolla, CA (US)
Filed on Apr. 26, 2019, as Appl. No. 16/396,584.
Claims priority of provisional application 62/748,991, filed on Oct. 22, 2018.
Claims priority of provisional application 62/718,751, filed on Aug. 14, 2018.
Claims priority of provisional application 62/695,564, filed on Jul. 9, 2018.
Claims priority of provisional application 62/671,957, filed on May 15, 2018.
Claims priority of provisional application 62/664,591, filed on Apr. 30, 2018.
Prior Publication US 2019/0333173 A1, Oct. 31, 2019
Int. Cl. G06Q 50/16 (2024.01); G06F 16/2458 (2019.01); G06F 16/29 (2019.01); G06F 40/268 (2020.01); G06F 40/279 (2020.01); G06F 40/30 (2020.01); G06N 5/048 (2023.01); G06N 20/00 (2019.01); G06Q 50/163 (2024.01); G06Q 50/26 (2024.01); H04L 67/10 (2022.01)
CPC G06Q 50/16 (2013.01) [G06F 16/2465 (2019.01); G06F 16/29 (2019.01); G06F 40/30 (2020.01); G06N 5/048 (2013.01); G06N 20/00 (2019.01); G06Q 50/163 (2013.01); G06Q 50/26 (2013.01); G06F 40/268 (2020.01); G06F 40/279 (2020.01); G06F 2216/03 (2013.01); H04L 67/10 (2013.01)] 26 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable storage media encoded with a computer program including instructions executable by a processor to create an application to train a neural network for detecting an improper residency tax status for a real estate property, the application performing at least the following:
a) collecting data from one or more data sources and determining one or more improper residency indicia from the data;
b) performing distant supervision training on a recurrent neural network with a first machine learning algorithm with a training dataset comprising at least a portion of the improper residency indicia, the first machine learning algorithm comprising one of, a naïve Bayes classification, a random forest, and deep learning wherein the distant supervision training comprises positive-unlabeled learning with the training dataset as the positive class;
c) running the first machine learning algorithm on a plurality of real estate documents to determine a plurality of commonly associated real estate properties for at least a portion of the plurality of real estate documents, wherein at least one of the commonly associated real estate properties comprises a primary residence;
d) training a second machine learning algorithm with regression modeling by constructing an initial model by assigning probability weights to predictor variables wherein relationships between the predictor variables and dependent variables are determined and weighted based on the plurality of commonly associated real estate properties and adjusting the probability weights based on verified data;
e) running the second machine learning algorithm on the plurality of commonly associated real estate properties; and
f) determining an improper residency tax status probability of the commonly associated real estate properties.