| CPC G06F 11/3409 (2013.01) [G06F 18/22 (2023.01); G06F 18/24137 (2023.01); G06F 18/24147 (2023.01); G06N 20/20 (2019.01)] | 20 Claims |

|
1. A method comprising:
collecting, from each computing device in a set of computing devices, configuration data for an operating system installed on the computing device, wherein:
the set of computing devices comprises a subset of sampled computing devices that have opted in to providing telemetry data indicative of usage of the operating system;
the set of computing devices comprises a subset of unsampled computing devices that have not opted in to providing the telemetry data indicative of the usage of the operating system; and
the configuration data includes attributes that indicate at least a geographic region in which the computing device is located, a version of the operating system installed on the computing device, and a default browser for the computing device;
collecting, from each sampled computing device in the subset of sampled computing devices, the telemetry data indicative of the usage of the operating system, wherein the telemetry data is useable to determine a metric of interest for the sampled computing device;
calculating, by at least one processor and based on a regression model, a propensity score for each computing device in the set of computing devices, wherein the propensity score represents a probability that the computing device is similar to other computing devices based on the configuration data collected from each computing device in the set of computing devices;
for each unsampled computing device in the subset of unsampled computing devices:
using k-Nearest Neighbors (k-NN) to identify, based on the propensity scores calculated for the set of computing devices, a sampled computing device that best represents the unsampled computing device; and
using the propensity score for the unsampled computing device as a multiplier to determine the metric of interest for the unsampled computing device, by applying the multiplier to the metric of interest previously determined for the sampled computing device that best represents the unsampled computing device.
|