US 12,468,736 B2
Method, apparatus, and computer-readable medium for efficiently classifying a data object of unknown type
Igor Balabine, Menlo Park, CA (US)
Assigned to INFORMATICA LLC, Redwood City, CA (US)
Filed by INFORMATICA LLC, Redwood City, CA (US)
Filed on Dec. 18, 2023, as Appl. No. 18/543,550.
Application 18/543,550 is a continuation of application No. 17/518,582, filed on Nov. 3, 2021, granted, now 11,886,467.
Prior Publication US 2024/0232229 A1, Jul. 11, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 7/08 (2006.01); G06F 16/22 (2019.01); G06F 16/28 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 7/08 (2013.01); G06F 16/2237 (2019.01); G06F 16/2264 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method executed by one or more computing devices for efficiently classifying a data object of unknown type, the method comprising:
storing a plurality of data domain vectors corresponding to a plurality of data domain models, each data domain model corresponding to a data object class and each data domain vector comprising a multidimensional vector having a plurality of dimensions, the plurality of dimensions corresponding to a plurality of features of a corresponding data domain model;
generating a data object vector corresponding to the data object, the data object vector comprising a multidimensional vector, with each dimension of the data object vector corresponding to a feature of the data object;
clustering the plurality of data domain vectors into a plurality of data domain clusters;
determining a classification query order corresponding to the data object based at least in part on a distance between the data object vector and one or more data domain clusters in the plurality of data domain clusters, the classification query order specifying an optimal sequence for applying one or more data domain classifiers corresponding to one or more data domain models in the plurality of data domain models to the data object, the optimal sequence being configured minimize a computational cost for classification of the data object.