US 12,125,067 B1
	Machine learning systems for automated database element processing and prediction output generation
David J. Fogarty, Old Greenwich, CT (US); Robert E. Chudzik, Marlborough, CT (US); Yogendra D. Bhosrekar, Bolton, CT (US); Stephanie C. Swain, Simsbury, CT (US); Man Tat Lam, Hong Kong (HK); Man Hin Wong, Hong Kong (HK); Margaret A. Shaw, East Hampton, CT (US); Yee Wah Eva Lee, South Windsor, CT (US); and Sourav Maharana, Bhubaneswar (IN)
Assigned to Cigna Intellectual Property, Inc., Wilmington, DE (US)
Filed by Cigna Intellectual Property, Inc., Wilmington, DE (US)
Filed on Dec. 29, 2020, as Appl. No. 17/136,466.
Application 17/136,466 is a continuation of application No. 17/136,395, filed on Dec. 29, 2020.
Claims priority of provisional application 62/955,006, filed on Dec. 30, 2019.
Int. Cl. G06Q 30/0251 (2023.01); G06F 18/2113 (2023.01); G06F 18/2415 (2023.01); G06N 20/20 (2019.01); G06Q 10/1057 (2023.01); G06Q 30/0201 (2023.01); G06Q 30/0204 (2023.01)

CPC G06Q 30/0269 (2013.01) [G06F 18/2113 (2023.01); G06F 18/24155 (2023.01); G06N 20/20 (2019.01); G06Q 10/1057 (2013.01); G06Q 30/0201 (2013.01); G06Q 30/0204 (2013.01)]

10 Claims

1. A computerized method of automatic distributed communication, the method comprising:

training a first machine learning model with historical feature vector inputs to generate a title score output using classifications for supervised learning, wherein:

the historical feature vector inputs include historical profile data structures specific to multiple historical entities,

the historical profile data structures include structured title data and structured response data, the structured title data including a job title matrix; and

training the first machine learning model includes:

classifying each one of the multiple historical entities as a decision entity or a non-decision entity according to the structured response data associated with the historical entity;

duplicating at least a portion of classified decision entity records in training data for the first machine learning model;

down-sampling at least a portion of classified non-decision maker records in the training data for the first machine learning model;

training a variable selection algorithm on the job title matrix to determine multiple significant keywords;

selecting a specified number of highest scoring ones of the determined multiple significant keywords; and

training a multinomial naive Bayes algorithm on a term frequency matrix of the selected specified number of keywords;

training a second machine learning model with the historical feature vector inputs to generate a background score output, wherein the historical profile data structures include structured background data, the structural background data includes a term frequency matrix, and training the second machine learning model includes:

duplicating at least a portion of classified decision entity records in training data for the second machine learning model;

down-sampling at least a portion of classified non-decision maker records in the training data for the second machine learning model; and

inputting the term frequency matrix and the structured background data into a binary classification algorithm;

obtaining a set of entities;

for each entity in the set of entities:

obtaining structured title data associated with the entity from a structured title database;

generating a title feature vector input according to the obtained structured title data;

processing, by the first machine learning model, the title feature vector input to generate the title score output, wherein the title score output is indicative of a likelihood that the entity is a decision entity according to the structured title data associated with the entity;

obtaining structured background data associated with the entity from a structured background database;

generating a background feature vector input according to the obtained structured background data;

processing, by the second machine learning model, the background feature vector input to generate the background score output, wherein the background score output is indicative of a likelihood that the entity is a decision entity according to the structured background data associated with the entity;

combining the generated background score output and the generated title score output to determine a decision score output;

selectively including the entity in a subset of entities based on a comparison of the decision score output to a threshold value; and

for each entity in the subset of entities, automatically distributing structured campaign data to the entity.