US 12,105,599 B2
Mitigating and automating backup failure recoveries in data protection policies
Pravin Ashok Kumar, Bangalore (IN); and Wei Wang, Chengdu (CN)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on May 6, 2022, as Appl. No. 17/739,001.
Application 17/739,001 is a continuation of application No. 17/188,073, filed on Mar. 1, 2021, granted, now 11,403,184.
Prior Publication US 2022/0276935 A1, Sep. 1, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 11/00 (2006.01); G06F 11/07 (2006.01); G06F 11/14 (2006.01)
CPC G06F 11/1461 (2013.01) [G06F 11/076 (2013.01); G06F 11/0769 (2013.01); G06F 11/0772 (2013.01); G06F 11/0775 (2013.01); G06F 11/0781 (2013.01); G06F 11/079 (2013.01); G06F 11/143 (2013.01); G06F 11/1458 (2013.01); G06F 2201/81 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising:
receiving a plurality of assets to associate to a data protection policy;
receiving configuration information for the data protection policy, the configuration information comprising a data protection job to perform for the assets, and a schedule for the data protection job;
generating a shadow policy comprising the configuration information from the data protection policy, and a retry protocol;
performing the data protection job according to the schedule in the data protection policy;
detecting a failure of the data protection job for an asset associated with the data protection policy;
moving the asset out of the data protection policy and into the shadow policy;
retrying the data protection job for the asset according to the retry protocol in the shadow policy;
determining that a threshold number of retries has been reached without the data protection job having been successfully completed;
upon the determination, collecting a plurality of logs maintained by a plurality of services involved with the data protection job, the logs having recorded a set of events, timestamps when the events occurred, and severity levels for the events;
dividing a length of time over which the retries occurred into a plurality of time intervals;
forming a plurality of timeslots corresponding to the plurality of time intervals;
grouping the plurality of events into the plurality of timeslots based on the timestamps of when the events occurred;
generating a dataset by summing, for each particular timeslot and each particular severity level, a number of events that occurred in that particular timeslot and had that particular severity level;
applying k-means clustering to the dataset to generate first and second cluster sets;
identifying one of the first or second cluster sets as being a target log segment based on the one of the first or second cluster sets having a greater number of events with higher severity levels than another of the first or second cluster sets; and
reporting, to a user, the target log segment.