US 12,482,000 B2
Generation of divergence distributions for automated data analysis
Marco Oliveira Pena Sampaio, Vila Nova de Gaia (PT); Pedro Cardoso Lessa e Silva, Oporto (PT); João Dias Conde Azevedo, Maia (PT); Ricardo Miguel De Oliveira Moreira, Lisbon (PT); João Tiago Barriga Negra Ascensão, Lisbon (PT); Pedro Gustavo Santos Rodrigues Bizarro, Lisbon (PT); Ana Sofia Leal Gomes, Lisbon (PT); and João Miguel Forte Oliveirinha, Loures (PT)
Assigned to Feedzai—Consultadoria e Inovação Tecnológica, S.A., (PT)
Filed by Feedzai - Consultadoria e Inovação Tecnológica, S.A., Coimbra (PT)
Filed on May 9, 2024, as Appl. No. 18/659,309.
Application 18/659,309 is a continuation of application No. 17/386,288, filed on Jul. 27, 2021, granted, now 12,020,256.
Claims priority of provisional application 63/135,314, filed on Jan. 8, 2021.
Claims priority of application No. 21187800 (EP), filed on Jul. 26, 2021; and application No. 117364 (PT), filed on Jul. 26, 2021.
Prior Publication US 2024/0346510 A1, Oct. 17, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06Q 20/40 (2012.01); H04L 9/40 (2022.01)
CPC G06Q 20/4016 (2013.01) 19 Claims
OG exemplary drawing
 
1. A method, comprising:
receiving, by a processor, a set of data elements, wherein the set of data elements includes a stream of events;
for each feature of a set of features, determining, by the processor, a corresponding reference distribution of the respective feature using the set of data elements, wherein the corresponding reference distribution characterizes a distribution of training data in a reference time period of the training data;
updating a histogram representing the corresponding reference distribution, wherein the update is constant in both time and memory with respect to a number of events in the stream of events contributing to the corresponding reference distribution;
for each feature of the set of features, determining, by the processor, one or more corresponding subset distributions for one or more subsets sampled from the set of data elements;
for each feature of the set of features, comparing, by the processor, the corresponding reference distribution with each of the one or more corresponding subset distributions to determine a corresponding distribution of divergences including by computing a divergence measure for each comparison of the corresponding reference distribution with the one or more corresponding subset distributions, wherein the divergence measure indicates a degree of difference between the corresponding reference distribution with the one or more corresponding subset distributions;
optimizing a memory usage and a computational cost associated with a retraining of a machine learning model including by determining an optimal time for the retraining of the machine learning model based on the degree of difference, wherein one or more features of the set of features are utilized by the machine learning model for predictive tasks; and
providing, by the processor, at least the determined distributions of divergences for the set of features associated with detection of unusual transactions for use in automated data analysis.