US 11,861,882 B2
Systems and methods for automated product classification
Kshetrajna Raghavan, Fremont, CA (US); Kyle Bruce Tate, Ottawa (CA); Peng Yu, Montreal (CA); Niklas Itänen, Ottawa (CA); and Xiaoxiao Li, Oakville (CA)
Assigned to SHOPIFY INC., Ottawa (CA)
Filed by SHOPIFY INC., Ottawa (CA)
Filed on Dec. 17, 2021, as Appl. No. 17/554,474.
Prior Publication US 2023/0196741 A1, Jun. 22, 2023
Int. Cl. G06V 10/774 (2022.01); G06V 10/762 (2022.01); G06V 10/776 (2022.01)
CPC G06V 10/7747 (2022.01) [G06V 10/7635 (2022.01); G06V 10/776 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method for partitioning data used for generating supervised learning models, the method comprising:
receiving an input dataset for e-commerce products, each sample in the dataset containing a set of attributes and associated values for each product, the attributes containing at least an image for each product;
representing each said sample in the dataset as a node on a graph with the associated values for that sample and associated with a particular product from the e-commerce products to provide a graph of nodes for the dataset;
measuring a relative similarity distance between each set of two nodes on the graph of nodes based on comparing at least image values for the attributes;
determining for each set of two nodes whether they are related if the relative similarity distance between them is below a defined threshold, and if related, generating an edge between them to provide connected nodes on the graph;
assigning each node on the graph of nodes to a first group or a second group, a particular node assigned to the first group if connected to at least one other node in the first group and assigned to the second group if no connection to another node in the first group to generate two disjoint groups such that the nodes grouped together have a shortest relative similarity distance with each other; and
wherein the first group is used as a training dataset to train a supervised learning model and the second group is used as a testing set to test the model, the model for subsequent use in predicting a classification of a new e-commerce product based on at least an image input.