CPC G06Q 30/0631 (2013.01) [G06F 16/9538 (2019.01); G06F 40/279 (2020.01); G06F 40/40 (2020.01); G06T 7/11 (2017.01); G06Q 30/0206 (2013.01); G06Q 30/0641 (2013.01); G06T 2200/24 (2013.01); G06T 2207/20081 (2013.01); G06T 2207/20132 (2013.01); G06T 2207/30176 (2013.01)] | 20 Claims |
1. A computer-implemented method for identifying resale good alternatives for retail goods, the method comprising:
identifying a retail good on a retail goods webpage;
extracting metadata of the retail good from the retail goods webpage by:
extracting the metadata of the retail good using metadata heuristics that analyze metadata structures of the retail goods webpage;
performing large language model (LLM)-based extraction to transform unstructured webpage content into structured retail good metadata; and
identifying and caching selectors that target HTML tags containing the metadata of the retail good for re-use during subsequent extractions;
classifying the retail good into a retail category using keyword heuristics and LLM-based classification;
cropping an image of the retail good to isolate the retail good using the retail category and a segmentation foundation model;
determining a descriptive color word for the retail good using clustering algorithms and color space mapping;
generating image vector embeddings for the retail good using a machine learning (ML) image embedding model;
generating text vector embeddings for the retail good using a ML text embedding model;
retrieving multiple ranked result sets from a vector database of resale goods including a first result set generated from the image vector embeddings and a second result set generated from the text vector embeddings;
merging the multiple ranked result sets into a unified result set;
re-ranking the unified result set by:
applying heuristics-based re-ranking;
applying ML language model-based re-ranking; and
applying preference-aware re-ranking; and
returning the re-ranked unified result set to the user.
|