| CPC G06F 9/453 (2018.02) [G06F 3/0482 (2013.01); G06F 3/04847 (2013.01); G06F 11/323 (2013.01); H04L 41/06 (2013.01); H04L 43/0811 (2013.01); H04L 43/0876 (2013.01); H04L 51/046 (2013.01)] | 20 Claims |

|
1. A system comprising:
one or more processors; and
a non-transitory computer-readable storage medium having instructions stored thereupon which, when executed by the one or more processors, cause the system to:
provide, for display within a graphical user interface (GUI) of a computing device associated with a customer, a list of suggested alarms to utilize to detect an incident relating to one or more service interruptions;
receive, via the GUI and from the computing device, a selection of an alarm of the list of suggested alarms;
identify an occurrence of the alarm, wherein the alarm indicates a service interruption that impacts, for a plurality of customers of a service provider network, access to functionality of an application that is hosted by the service provider network and that is developed by the customer of the service provider network;
identify a group of users, assigned to resolve the service interruption, to notify, wherein the group of users is associated with the customer;
transmit, an electronic message to the group of users, wherein the electronic message indicates the service interruption;
identify actions that are associated with resolving the service interruption, wherein at least a portion of the actions are based on first previous actions performed within the service provider network to resolve first previous service interruptions that are a same as the service interruption and second previous actions performed within the service provider network to resolve second previous service interruptions that are determined to be similar to the service interruption, but that are different than the service interruption, and that have yet to occur with respect to the plurality of customers, wherein the actions include predefined manual procedures and predefined automated procedures directed to resolving the service interruption, and wherein the actions include re-starting a first service associated with the service provider network without receiving first user input associated with the first service and configuring a second service associated with the service provider network without receiving second user input associated with the second service;
provide, for display within the GUI of the computing device:
a runbook user interface (UI) element that presents a graphical representation of the actions to perform to resolve the service interruption;
metric data that is identified as causing the alarm; and
a chat UI element that presents messages exchanged between the group of users during a time associated with resolving the service interruption;
in response to a selection of an action of the actions from the runbook UI element, perform the action;
update the GUI to indicate that the action has been performed; and
update the GUI to indicate a next action to be performed.
|