| CPC G06F 40/40 (2020.01) [G06F 16/383 (2019.01); G06F 40/284 (2020.01); G06N 20/20 (2019.01); G06F 16/3347 (2019.01); G06F 40/205 (2020.01); G06F 40/216 (2020.01); G06F 40/242 (2020.01); G06F 40/279 (2020.01); G06F 40/30 (2020.01)] | 14 Claims |

|
1. A non-transitory, machine-readable medium storing instructions that, when executed by one or more processors, effectuate operations comprising:
generating, by a computer system and for a first data object including a first set of data entries where each data entry of the first set of data entries includes text content associated with a time entry and each data entry is associated with a more expansive record of text content, a first data object score using the text content and the time entries included in the first set of data entries and using scoring parameters, wherein the generating the first data object score includes:
converting the text content of each data entry to text strings;
selecting text strings for each data entry that satisfy a frequency condition;
vectorizing the selected text strings for each data entry to generate a data entry vector representation for each data entry, wherein the vectorization is based on a data object type and wherein the vectorizing includes a semantic embedding natural language process to understand meaning of the text strings in the text content;
determining a recency weight for each data entry based on the time entry and the data entry vector representation, wherein the recency weight is determined by applying an exponential decay function to a delta of the time entry and a time associated with the first data object divided by a constant derived by a machine learning optimization technique for that data entry vector representation; and
inputting the recency weight and the data entry vector representation for each data entry into a gradient boosting machine learning model that is trained based on data entry vector representations and associated recency weights and how the vector representations and the associated recency weights affect influence on each other, wherein the gradient boosting machine learning model outputs the first data object score for the first data object;
determining, by the computer system, that the first data object score satisfies a data object score condition; and
storing, by the computer system, the first data object in a database in response to determining that the first data object score satisfied the data object score condition.
|