US 12,455,944 B2
	Method, electronic device, and computer program product for analyzing samples
Zijia Wang, WeiFang (CN); Jiacheng Ni, Shanghai (CN); Zhen Jia, Shanghai (CN); and Wenbin Yang, Shanghai (CN)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Oct. 4, 2021, as Appl. No. 17/492,853.
Claims priority of application No. 202111011011.6 (CN), filed on Aug. 31, 2021.
Prior Publication US 2023/0064850 A1, Mar. 2, 2023
Int. Cl. G06F 18/2413 (2023.01); G06F 18/2132 (2023.01); G06F 18/22 (2023.01); G06N 20/00 (2019.01)

CPC G06F 18/2413 (2023.01) [G06F 18/21322 (2023.01); G06F 18/22 (2023.01); G06F 18/21328 (2023.01); G06N 20/00 (2019.01)]

19 Claims

1. A method for analyzing samples, including:

acquiring, in a processor-based machine learning system, a set of feature representations associated with a set of samples, wherein the set of samples have classification information for indicating classifications of the set of samples;

adjusting, in the processor-based machine learning system, the set of feature representations so that distances between feature representations of samples corresponding to the same classification are less than a first distance threshold, wherein the adjusting comprises obtaining an adjustment matrix based on a loss function, the loss function comprising a function that iteratively processes distances between respective pairs of feature representations of the samples, wherein the function that iteratively processes the distances between the respective pairs of feature representations comprises, for a given such iteration, at least a difference between an inverse-squared function of the distance and an inverse-cubed function of the distance, and wherein the adjusting is performed at least in part utilizing the obtained adjustment matrix; and

training, in the processor-based machine learning system, a classification model based on the adjusted set of feature representations and the classification information, wherein the classification model is configured to receive an input sample and determine a classification of the input sample from among a plurality of classifications each associated with a different calibrated feature distribution of a respective corresponding portion of the set of samples, the adjusted set of feature representations and classification information utilized to train the classification model providing increased concentrations of samples associated with respective same classifications, thereby making different ones of the feature representations corresponding to respective different classifications more easily distinguishable from one another, and reducing a number of samples needed to achieve a given level of accuracy of the trained classification model relative to a number of samples that would otherwise be needed absent the adjusting.