US 12,487,980 B2
Compact probabilistic data structure for storing log data
Julian Reichinger, Sankt Johann am Wimberg (AT); and Renee Trisberg, Muraste (EE)
Assigned to Dynatrace LLC, Boston, MA (US)
Filed by Dynatrace LLC, Waltham, MA (US)
Filed on Oct. 23, 2023, as Appl. No. 18/383,031.
Application 18/383,031 is a continuation in part of application No. 18/119,331, filed on Mar. 9, 2023, granted, now 12,229,107.
Claims priority of provisional application 63/437,865, filed on Jan. 9, 2023.
Prior Publication US 2024/0232158 A1, Jul. 11, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/22 (2019.01)
CPC G06F 16/2255 (2019.01) 10 Claims
OG exemplary drawing
 
1. A computer-implemented method for storing log data generated in a distributed computing environment, comprising:
receiving a log line;
applying a first tokenization rule to the log line to create a plurality of base tokens, where each base token is a sequence of successive characters in the log line having same type;
applying a second tokenization rule to the log line to create a plurality of combination tokens, where each combination token is comprised of two or more base tokens appended together;
applying a third tokenization rule to the log line to create a plurality of n-gram tokens, where each n-gram token is an n-gram derived from a base token in the plurality of base tokens;
combining tokens from the plurality of base tokens, the plurality of combination tokens and the plurality of n-gram tokens to form a set of tokens;
for each token in the set of tokens, storing a given token by
applying a hash function to the given token to generate a hash value, where the given token is associated with a given software module at which the log line was produced;
updating a listing of software entities with the given software module, where entries in the listing of software entities can identify more than one software module and each entry in the listing of software entities specifies a unique set of software modules; and
storing the hash value, along with an address, in a token map table of a probabilistic data structure, where the address maps the hash value to an entry in the listing of software entities.