US 12,141,837 B2
Methods and apparatus to extract information from uniform resource locators
Muktamala Chakrabarti, Grantham (GB)
Assigned to The Nielsen Company (US), LLC, New York, NY (US)
Filed by The Nielsen Company (US), LLC, New York, NY (US)
Filed on Aug. 5, 2022, as Appl. No. 17/882,380.
Claims priority of provisional application 63/230,324, filed on Aug. 6, 2021.
Prior Publication US 2023/0045424 A1, Feb. 9, 2023
Int. Cl. G06Q 30/0251 (2023.01); G06F 16/955 (2019.01)
CPC G06Q 30/0256 (2013.01) [G06F 16/9566 (2019.01); G06Q 30/0271 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A system including:
an audience measurement entity (AME) comprising a data store;
a cloud computing server implemented by the AME, wherein the cloud computing server comprises a cloud storage including first, second, and third cloud storage buckets each having data tables;
network interface circuitry to:
obtain first uniform resource locator (URL) information from client devices accessing webpages that participate in a script, wherein the script causes a respective web browser of the client devices to report monitoring information including the first URL information to the cloud computing server, the first URL information being unstructured data and corresponding to first media accessed by first users;
transmit the first URL information to the cloud computing server;
store the first URL information as the unstructured data in the second cloud storage bucket of the cloud storage;
at least one memory;
programmable circuitry; and
instructions, stored in the at least one memory, to cause the programmable circuitry to:
parse the first URL information from the second cloud storage bucket into metadata, the metadata to represent the first URL information as cleaned URL information;
store the metadata representing the cleaned URL information in the data store of the AME;
map, in the third cloud storage bucket of the cloud storage, the metadata in the data store of the AME to the first URL information in the second cloud storage bucket;
determine host URLs in the cleaned URL information;
determine feature-to-user assignment rules based on at least one of the metadata or the host URLs; and
store the feature-to-user assignment rules in the first cloud storage bucket of the cloud storage;
wherein the network interface circuitry is further to:
collect second URL information from second client devices, the second URL information corresponding to second media accessed by second users;
the instructions to cause the programmable circuitry further to:
update the feature-to-user assignment rules based on the second URL information comprising assigning features to the second users based on the second URL information and the feature-to-user assignment rules; and
second interface circuitry to:
transmit the updated feature-to-user assignment rules to the cloud computing server to store the updated feature-to-user assignment rules in the first cloud storage bucket of the cloud storage.