US 11,809,476 B1
Modular database recrawl system
Yash Rajkumar Kedia, Redmond, WA (US); Kristofer D. Hoffman, Sammamish, WA (US); Ana Monica Irimia, Adliswil (CH); and John Berkeley, Kenmore, WA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 10, 2022, as Appl. No. 17/837,446.
Int. Cl. G06F 16/24 (2019.01); G06F 16/35 (2019.01); G06F 16/335 (2019.01); G06F 16/2455 (2019.01)
CPC G06F 16/353 (2019.01) [G06F 16/2455 (2019.01); G06F 16/335 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A data processing device comprising:
at least one processor; and
a machine-readable medium storing executable instructions that, when executed, cause the processor to perform operations comprising:
receiving job definitions including SQL queries for performing reprocessing operations on databases in a database system of a cloud-based service via a user input device of a modular selective recrawl system;
generating recrawl jobs based on the job definitions using a recrawl job generating module of the modular selective recrawl system;
fighting the recrawl jobs to the database system using a fighting system of the cloud-based service;
generating iterations of recrawl timer jobs for each of the databases in the database system based on a predefined recrawl timer job base class, each of the iterations being triggered based on a predefined schedule for the recrawl timer jobs, wherein, during each of the iterations, a recrawl timer job associated with a database of the database system is configured to perform functions comprising:
accessing a recrawl job list for the database, the recrawl job list including each of the recrawl timer jobs flighted to the database system;
accessing a property list of the database to identify recrawl job information stored in the property list during a previous iteration of the recrawl timer job;
based on the recrawl job information, selecting a respective batch of documents to be reprocessed in association with each of the recrawl jobs on the recrawl job list;
reprocessing each of the respective batches of documents using the reprocessing operation of the recrawl job associated with the batch of documents; and
once each of the batches of documents has been reprocessed, storing a last document identifier in the property list in association with each of the recrawl jobs.