CPC G06F 16/957 (2019.01) [H04L 67/535 (2022.05)] | 16 Claims |
1. A system for reconstructing browser interaction data from session data having incomplete tracking data, the system comprising:
a data ingestion engine for ingesting data from a plurality of different data sources including an on-line user-interaction tracking source which provides the session data relating to different users' interaction with a website, some of the session data including tracking identifiers, and a non-interaction tracking source for providing non-session data relating to user activity other than session data;
a data store for storing the ingested data;
a data cleansing engine for cleansing the ingested data, the data cleansing engine comprising:
a data re-evaluation engine for evaluating the non-session data and recovering user identifiers within the non-session data; and
a path view building engine for linking together session data from different user interaction sessions to form linked session data using the tracking identifiers within the session data;
wherein the data re-evaluation engine is arranged to compare the recovered user identifiers from the non-session data with user identifiers associated with the session data, and to associate any unlinked session data not previously linked with the linked session data, with linked session data which has an association via the recovered user identifiers,
wherein the path view building engine is arranged to link together any unlinked session data with linked session data having an association via the recovered user identifiers,
wherein the data cleansing engine is arranged to consider the session data and a plurality of predetermined dimensions and assign a plurality of dimensions to the session data matching that session data,
the system further comprising a segmentation engine for grouping together the session data into segments based on the plurality of dimensions applied to that session data, wherein the segmentation engine is arranged to use combinations of dimensions which are associated with a particular type of data analysis,
wherein the system is arranged to define a segment as a particular combination of dimensions and some of the plurality of dimensions as lossy, and the segmentation engine is arranged to combine the lossy dimensions to reduce the number of different combinations of segments to be considered by the segmentation engine, and
wherein the segmentation engine is arranged to use historic session data to determine the historic probability of use of each segment, and the segmentation engine further comprises a table compression engine for creating a compressed table which combines together segments having a low historic probability of use as determined from the historic session data.
|