CPC G06Q 40/03 (2023.01) [G06F 16/2237 (2019.01); G06F 16/2282 (2019.01); G06F 16/2433 (2019.01); G06F 18/22 (2023.01); G06N 3/04 (2013.01)] | 20 Claims |
1. A method for grouping data files into related groups, said method comprising:
receiving a plurality of data files at a file input drop zone;
processing and formatting the plurality of data files;
loading the processed data files into hive tables;
combining files in the hive tables into common structured files;
providing prior stored base data hive tables that include matched groups of files from a previous time that a file matching process was performed;
performing a deterministic matching process that matches the common structured files with each other and with the prior stored matched groups of files into the related groups;
performing a machine learning based cosine similarity matching process that matches the common structured files with each other and with the prior stored matched groups of files into the related groups, said machine learning based cosine similarity matching process using at least one neural network;
combining the matched files from the deterministic matching process and the machine learning based cosine similarity matching process; and
providing the combined and matched files at a file output drop zone.
|