US 12,086,021 B2
Clustering of structured log data by key-values
Udit Saxena, Mountain View, CA (US); Reetika Roy, Redwood City, CA (US); Ryley Higa, Honolulu, HI (US); David M. Andrzejewski, San Francisco, CA (US); and Bashyam Tca, Walnut Creek, CA (US)
Assigned to Sumo Logic, Inc., Redwood City, CA (US)
Filed by Sumo Logic, Inc., Redwood City, CA (US)
Filed on Apr. 12, 2023, as Appl. No. 18/299,218.
Application 18/299,218 is a continuation of application No. 17/009,649, filed on Sep. 1, 2020, granted, now 11,663,066.
Claims priority of provisional application 63/031,464, filed on May 28, 2020.
Prior Publication US 2023/0315558 A1, Oct. 5, 2023
Int. Cl. G06F 16/00 (2019.01); G06F 11/07 (2006.01); G06F 16/21 (2019.01); G06F 16/24 (2019.01); G06F 16/2455 (2019.01); G06F 16/25 (2019.01); G06F 16/35 (2019.01)
CPC G06F 11/0784 (2013.01) [G06F 11/0775 (2013.01); G06F 11/0781 (2013.01); G06F 11/0787 (2013.01); G06F 16/211 (2019.01); G06F 16/24 (2019.01); G06F 16/24553 (2019.01); G06F 16/258 (2019.01); G06F 16/358 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving a request to organize a plurality of log messages into clusters based on keys included in the plurality of log messages, each log message comprising machine data in a set of key-value pairs, each key-value pair comprising a key and a value for the key, each cluster being defined by a key schema that comprises a unique plurality of keys that are contained in each of the log messages of the cluster, wherein each cluster corresponds to a unique combination of keys;
for each log message from the plurality of log messages:
calculating a distance from the keys in the log message to keys in the log messages already in the clusters; and
assigning the log message to an existing cluster when the distance from the keys in the log message to the keys in any of the log messages in the existing cluster is less than or equal to a predetermined threshold, or creating a new cluster with the log message when the distance from the log message to any of the log messages already in clusters is greater than the predetermined threshold; and
presenting, in a user interface (UI), information about the clusters, the information comprising the unique plurality of keys for the log messages in each cluster.