| CPC G06F 16/2282 (2019.01) [G06F 16/24573 (2019.01)] | 21 Claims |

|
1. A computer-implemented method for storing log data generated in a distributed computing environment, comprising:
receiving a data element from a log line, where the data element is associated with a given computing source at which the log line was produced;
applying a hash function to the data element to generate a hash value;
updating a listing of computing entities with the given computing source, where entries in the listing of computing entities can identify more than one computing source and each entry in the listing of computing entities specifies a unique set of computing sources;
storing the hash value, along with an address, in a token map table of a probabilistic data structure, where the address maps the hash value to an entry in the listing of computing entities; and
the addresses in the token map table and encoding the entries in the listing of computing entities, wherein the probabilistic data structure is stored in a file format having three sequential sections, where a first section of the file format contains a version number and information describing encoding steps applied to the data stored in the probabilistic data structure, a second section of the file format contains header information, and a third section of the file format contains the data stored in the probabilistic data structure.
|