US 11,983,226 B2
Real-time crawling
Keshav Bhashyam, Bangalore (IN); Pradeep Srinivas Krishna, Bangalore (IN); Vijaykumar Hiremath, Bangalore (IN); and Saikiran Sri Thunuguntla, Bangalore (IN)
Assigned to Intuit Inc., Mountain View, CA (US)
Filed by Intuit Inc., Mountain View, CA (US)
Filed on Dec. 17, 2021, as Appl. No. 17/555,117.
Prior Publication US 2023/0195806 A1, Jun. 22, 2023
Int. Cl. G06F 16/951 (2019.01); G06F 16/23 (2019.01)
CPC G06F 16/951 (2019.01) [G06F 16/2358 (2019.01)] 16 Claims
OG exemplary drawing
 
1. A method for updating a data catalog in real-time with changes to metadata, the method performed by one or more processors of a crawling system and comprising:
detecting, using a relational database management system (RDBMS)-based adaptor, changes to metadata in a metadata store, the RDBMS-based adaptor incorporating one or more aspects of a change data capture (CDC) source connector;
generating, based on the detected changes, an event queue indicating a plurality of change events associated with system objects, each respective change event of the plurality of change events indicating a system object associated with the respective change event and a number of event attributes of the respective change event, the event queue incorporating one or more aspects of an event bus topic associated with a stream processing application;
storing raw system data corresponding to the system objects in a data repository, wherein the data repository is a data lake;
identifying, using a consumer associated with the stream processing application, recent events among the plurality of change events based on a first event attribute of the number of event attributes, wherein the first event attribute is a timestamp indicative of a time that the respective change event occurred, wherein each of the recent events occurs during a specified time window, and wherein identifying the recent events includes identifying a most recent timestamp within the specified time window;
identifying relevant events among the recent events based on a second event attribute of the number of event attributes;
extracting unique identifiers from the relevant events based on a third event attribute of the number of event attributes;
identifying priority objects among the system objects based on the unique identifiers, each of the priority objects associated with at least one of the recent events and at least one of the relevant events;
selectively obtaining, from the metadata store, current metadata for ones of the system objects, the selectively obtaining including:
obtaining current metadata for each of the system objects identified as priority objects; and
refraining from obtaining current metadata for system objects that are not associated with at least one of the recent events and at least one of the relevant events; and
updating registry values associated with the priority objects in a metadata registry immediately upon obtaining the current metadata from the metadata store, the updating occurring in at least near real-time with the changes to the priority objects corresponding to the detected changes to the metadata in the metadata store.