US 11,989,221 B2
Systems and methods for removing non-conforming web text
Nitin Kishore Sai Samala, Milpitas, CA (US)
Assigned to WALMART APOLLO, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Sep. 20, 2021, as Appl. No. 17/479,993.
Prior Publication US 2023/0087738 A1, Mar. 23, 2023
Int. Cl. G06F 16/338 (2019.01); G06F 16/35 (2019.01)
CPC G06F 16/338 (2019.01) [G06F 16/353 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform operations comprising:
parsing one or more web texts;
determining a respective web text sentiment score for each respective web text of the one or more web texts, as parsed;
creating a ranked list of one or more match words in the one or more web texts;
scoring the one or more match words in the ranked list of the one or more match words;
creating a report covering a predetermined period of time using the one or more match words, as scored, in the ranked list;
extracting one or more topics from the report covering the predetermined period of time;
labeling, using a generative model, the one or more match words to create labeled training data, wherein the generative model is configured to determine a distribution of each label among respective data points prior to assigning respective labels to respective training data;
training a word-based classifier using the labeled training data to identify non-conforming web text submitted to a website for display, wherein the one or more web texts comprises the non-conforming web text;
determining a word-based classifier score using the word-based classifier;
determining an image-based classifier score using an image-based classifier;
combining the word-based classifier score with the image-based classifier score to create a hybrid score; and
automatically removing the non-conforming web text when at least one of the word-based classifier score, the image-based classifier score, or the hybrid score exceeds a predetermined threshold submitted to the website for display.