CPC G06F 16/258 (2019.01) [G06F 16/9024 (2019.01); G06F 17/18 (2013.01); G06N 3/045 (2023.01); G06N 3/047 (2023.01)] | 20 Claims |
1. A system for formatting data, the system comprising:
at least one memory storing instructions; and
one or more processors configured to execute the instructions to perform operations comprising:
generating a first probabilistic graph, the first probabilistic graph including a set of nodes corresponding to positions in received data value sequences, by iteratively:
determining conditional counts of occurrences of received data values at a subsequent node in the set of nodes, the conditional counts being based on counting instances of received data values at one or more preceding nodes in the set of nodes; and
determining conditional probabilities based on the conditional counts;
determining a similarity metric of a second probabilistic graph and the first probabilistic graph, the second probabilistic graph being generated by a machine learning model; and
training the machine learning model based on the similarity metric.
|