| CPC G06F 11/3452 (2013.01) [G06F 16/244 (2019.01); G06F 16/2462 (2019.01)] | 23 Claims |

|
1. A computerized method comprising:
extracting a format representation for a first data sample of an incoming data stream by at least accessing a data schema for each field of the first data sample to determine a data point type, wherein the first data sample comprises a plurality of data points, each data point of the plurality of data points is maintained within a field of the first data sample and corresponds to a performance measurement directed to (i) computing resource associated with a source of the incoming data stream or (ii) an operating state of the source of the incoming data stream;
conducting transformations on format representations associated with data point types of the first data sample to produce a first plurality of count values, wherein the transformed format representations associated with each data point type within the first data sample operates as a count reference;
accessing a data schema for each field of a second data sample of the incoming data stream to determine a data point type for identifying changes in field format;
conducting transformations on format representations associated with data point types of a second data sample of the incoming data stream to produce a second plurality of count values, wherein the second plurality of count values identifying a number of occurrences of the transformed format representation associated with each data point type within the second data sample;
computing a first probability distribution based on the first plurality of count values;
computing a second probability distribution based on the second plurality of count values;
conducting analytics using the first probability distribution and the second probability distribution to produce a first metric; and
determining a format drift for the data stream in response to evaluating the first metric to a second metric operating as a threshold metric signifying a format drift condition.
|