US 11,853,274 B2
Efficient deduplication of randomized file paths
Ganeshan Ramachandran Iyer, Redmond, WA (US); Raghav Ramachandran, Seattle, WA (US); and Subramanian Muralidhar, Mercer Island, WA (US)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Oct. 21, 2022, as Appl. No. 17/971,482.
Application 17/971,482 is a continuation of application No. 17/709,234, filed on Mar. 30, 2022, granted, now 11,494,352.
Prior Publication US 2023/0315700 A1, Oct. 5, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 16/215 (2019.01); G06F 16/248 (2019.01); G06F 16/2455 (2019.01); G06F 16/2457 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/248 (2019.01); G06F 16/24552 (2019.01); G06F 16/24573 (2019.01)] 24 Claims
OG exemplary drawing
 
1. A method comprising:
building a bloom filter for each of a first set of files to be ingested into a data exchange to generate a set of bloom filters, wherein the data exchange includes a metadata storage where metadata including a list of files ingested is stored;
storing the set of bloom filters in the metadata storage of the data exchange; and
in response to receiving a set of candidate files to be ingested into the data exchange:
generating file loading metadata for the set of candidate files;
generating a reduced set of candidate files using minimum/maximum pruning based on the file loading metadata for the set of candidate files; and
in response to generating the reduced set of candidate files, removing from within the reduced set of candidate files, by a processing device, each candidate file that is duplicative of a file in the first set of files using the set of bloom filters to generate a further reduced set of candidate files.