CPC G06F 16/215 (2019.01) [G06F 16/248 (2019.01); G06F 16/24552 (2019.01); G06F 16/24573 (2019.01)] | 24 Claims |
1. A method comprising:
building a bloom filter for each of a first set of files to be ingested into a data exchange to generate a set of bloom filters, wherein the data exchange includes a metadata storage where metadata including a list of files ingested is stored;
storing the set of bloom filters in the metadata storage of the data exchange; and
in response to receiving a set of candidate files to be ingested into the data exchange:
generating file loading metadata for the set of candidate files;
generating a reduced set of candidate files using minimum/maximum pruning based on the file loading metadata for the set of candidate files; and
in response to generating the reduced set of candidate files, removing from within the reduced set of candidate files, by a processing device, each candidate file that is duplicative of a file in the first set of files using the set of bloom filters to generate a further reduced set of candidate files.
|