US 12,333,448 B2
Machine-learned model for duplicate crash dump detection
Hao Yang, Xi'an (CN); Yang Xu, Xi'an (CN); Yong Li, Xi'an (CN); and Hyun Deok Choi, Seoul (KR)
Assigned to SAP SE, Walldorf (DE)
Filed by SAP SE, Walldorf (DE)
Filed on Mar. 1, 2021, as Appl. No. 17/188,256.
Claims priority of provisional application 63/086,403, filed on Oct. 1, 2020.
Prior Publication US 2022/0108191 A1, Apr. 7, 2022
Int. Cl. G06N 5/04 (2023.01); G06F 16/215 (2019.01); G06N 20/00 (2019.01)
CPC G06N 5/04 (2013.01) [G06F 16/215 (2019.01); G06N 20/00 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
at least one hardware processor; and
a non-transitory computer-readable medium storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising:
obtaining a plurality of crash dumps in a computer system, the crash dumps each comprising a separate file containing information about a crash failure that occurred during operation of a database, the information including one or more function names and one or more parameters;
preprocessing the plurality of crash dumps by removing the one or more parameters from each crash dump in the plurality of crash dumps;
using the one or more function names in each of the crash dumps to identify an individual component of the database for each of the one or more function names;
adding an indication of each of the identified individual components into the preprocessed plurality of crash dumps; and
training a first machine-learned model by feeding the preprocessed plurality of crash dumps and a coefficient indicating a level of component distance into a first machine learning algorithm, the first machine-learned model trained to output a similarity score between two crash dumps using a longest common subsequence from each crash dump, the first machine-learning model basing the similarity score on a coefficient indicating a number of components to consider from a top of a crash dump, component position in the crash dump, and the coefficient indicating the level of component distance, the level of component distance being a measurement of how similar two components are based on similarity of function names of the two components, each component being a software module upon which one or more functions are executed.