| CPC G06F 40/247 (2020.01) [G06F 16/211 (2019.01); G06F 16/215 (2019.01); G06F 16/285 (2019.01)] | 20 Claims |

|
1. A method comprising, at a computer system comprising a processor and a non-transitory tangible computer-readable medium:
obtaining a plurality of attribute tuples stored in a database, each of the plurality of attribute tuples comprising an attribute type and an attribute value for a corresponding item of a plurality of items;
applying a clustering algorithm to the plurality of attribute tuples to group the plurality of attribute tuples into a first plurality of clusters, wherein applying the clustering algorithm comprises:
obtaining an embedding for each of the plurality of attribute tuples,
computing, using the embedding for each of the plurality of attribute tuples, a similarity score for each pair of the plurality of attribute tuples, and
applying a clustering model to group each pair of the plurality of attribute tuples having the similarity score above a threshold score to form the first plurality of clusters;
generating a plurality of prompts for input into a language model, wherein each prompt of the plurality of prompts is generated to include a respective subset of attribute tuples from the plurality of attribute tuples, and wherein attribute tuples from the respective subset of attribute tuples were grouped into a respective cluster of the first plurality of clusters;
requesting the language model to generate, based on each of the plurality of prompts input into the language model, one or more clusters of a second plurality of clusters, each cluster of the second plurality of clusters including one or more attribute tuples of the plurality of attribute tuples that have a common attribute type and a common attribute value;
generating, for each cluster of the second plurality of clusters, a respective normalized attribute tuple of a plurality of normalized attribute tuples, the respective normalized attribute tuple comprising a normalized attribute type and a normalized attribute value that are based on the common attribute type and the common attribute value;
mapping each of the one or more attribute tuples that belongs to each cluster of the second plurality of clusters to the respective normalized attribute tuple; and
rewriting each of the plurality of attribute tuples in the database to a corresponding normalized attribute tuple of the plurality of normalized attribute tuples to generate a respective rewritten attribute tuple of a plurality of rewritten attribute tuples.
|