| CPC G06Q 10/087 (2013.01) | 14 Claims |

|
1. A system comprising:
a computing device comprising at least one processor, where the computing device is configured to:
obtain first textual data from an item description of a first item;
obtain second textual data from an item description of a second item;
generate a plurality of features based on the first textual data and the second textual data, wherein each feature corresponds to a vector representation generated based on the item description of the first item or the second item;
input the generated plurality of features to a trained machine learning model to generate output data comprising a first similarity score characterizing a textual similarity of item descriptions between the first textual data and the second textual data, wherein the trained machine learning model is trained based on features generated from catalog data of a plurality of items and trained until at least one metric threshold is satisfied;
determine whether the first item maps to the second item based on:
determining whether the first similarity score representing a description similarity between the first item and the second item satisfies a first threshold,
in accordance with a determination that the first similarity score representing a description similarity between the first item and the second item does not satisfy the first threshold, determining a second similarity score representing an attribute similarity between the first item and the second item and characterizing a string similarity between an attribute of each of the first item and the second item, wherein the attribute includes at least one of a brand, a color, a packaging or a category of each of the first item and the second item;
generate mapping data based on the determination of whether the first item maps to the second item; and
store the mapping data in a data repository.
|