US 11,860,867 B2
Optimizing scans using query planning on batch data
Mridul Jain, Cupertino, CA (US); Saigopal Thota, Fremont, CA (US); Rewati Mahendra Ovalekar, Fremont, CA (US); Sébastien Jean-Maurice Olivier Péhu, San Rafael, CA (US); Saumya Agarwal, Milpitas, CA (US); Sai Kiran Reddy Malikireddy, Fremont, CA (US); Gajendra Alias Nishad Kamat, Los Altos, CA (US); and Mitesh Sinha, Fremont, CA (US)
Assigned to WALMART APOLLO, LLC, Bentonville, AR (US)
Filed by Walmart Apollo, LLC, Bentonville, AR (US)
Filed on Aug. 25, 2021, as Appl. No. 17/412,106.
Prior Publication US 2023/0068831 A1, Mar. 2, 2023
Int. Cl. G06F 15/16 (2006.01); G06F 16/2453 (2019.01); G06F 16/21 (2019.01)
CPC G06F 16/24532 (2019.01) [G06F 16/211 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
one or more processors; and
one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform:
translating event records into a non-relational (NoSQL) schema, wherein translating the event records into the NoSQL schema comprises:
determining access patterns of data clients, wherein dataset layers are based on the access patterns of the data clients;
generating, based on the access patterns, a second layer of the dataset layers, wherein the second layer comprises intermediate states for a subset of queries of the access patterns that exceed a predetermined threshold, wherein the NoSQL schema comprises the dataset layers, wherein the dataset layers comprises a first layer and the second layer, and wherein the first layer comprises user profiles of users; and
periodically updating the second layer in the NoSQL schema as additional queries of the access patterns exceed the predetermined threshold;
defragmenting the event records by assigning user identifiers of the users to the event records received in the event streams from the one or more producers in a user domain object model;
bundling multiple registered queries of a dataset using a scheduling technique, wherein the dataset is homogenous in schema;
running a single table scan of the dataset to process the multiple registered queries of the dataset in parallel; and
generating a respective output responsive to each of the multiple registered queries.