US 12,259,934 B2
Machine-learning-aided automatic taxonomy for web data
Rohit Kewalramani, Pune (IN); and Justin Chien, Campbell, CA (US)
Assigned to 6SENSE INSIGHTS, INC., San Francisco, CA (US)
Filed by 6SENSE INSIGHTS, INC., San Francisco, CA (US)
Filed on Feb. 6, 2024, as Appl. No. 18/434,693.
Application 18/434,693 is a continuation of application No. 17/828,910, filed on May 31, 2022, granted, now 11,914,657.
Claims priority of provisional application 63/195,566, filed on Jun. 1, 2021.
Prior Publication US 2024/0176827 A1, May 30, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 16/951 (2019.01); G06F 40/205 (2020.01); G06N 3/04 (2023.01); G06N 3/082 (2023.01)
CPC G06F 16/951 (2019.01) [G06F 40/205 (2020.01); G06N 3/04 (2013.01); G06N 3/082 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising using at least one hardware processor to:
during a training mode, train a model to predict a class, from a plurality of classes in a taxonomy of web-based activities, based on a training dataset that comprises a plurality of annotated features, wherein each of the plurality of annotated features comprises one or more features, which have been derived from one or both of a uniform resource locator (URL) of an online resource or metadata associated with that online resource, and a ground-truth class assigned to those one or more features; and,
during an operation mode,
acquire web data comprising one or more activity records, wherein each of the one or more activity records comprises a URL of an online resource that was accessed by a visitor, and metadata associated with that online resource, and,
for each of the one or more activity records,
extract a set of one or more features from one or both of the URL or the metadata in the activity record,
apply the trained model to the set of one or more features to predict a class, from the plurality of classes in the taxonomy, that is associated with the set of one or more features, and
store the predicted-class in association with the URL in the activity record.