US 12,314,265 B1
Method and system for content-based indexing of data streams in streaming storage systems
Yi Liu, Shanghai (CN); Raúl Gracia-Tinedo, Barcelona (ES); Flavio Paiva Junqueira, Barcelona (ES); and Thomas Kaitchuck, Portland, OR (US)
Assigned to DELL PRODUCTS L.P., Round Rock, TX (US)
Filed by Dell Products L.P., Round Rock, TX (US)
Filed on Jan. 23, 2024, as Appl. No. 18/419,820.
Int. Cl. G06F 16/2455 (2019.01); G06F 9/54 (2006.01); G06F 16/22 (2019.01); G06F 16/248 (2019.01)
CPC G06F 16/24568 (2019.01) [G06F 9/547 (2013.01); G06F 16/22 (2019.01); G06F 16/248 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method for managing a data stream (DS), the method comprising:
receiving, by an analytics engine (AE), a first request from an administrator to analyze the DS,
wherein the first request specifies an identifier of the administrator, an identifier of the DS, and an analytics job,
wherein the analytics job specifies what type of indexing needs be performed on the DS and what features from the DS need be identified for indexing;
in response to the first request and by the AE, obtaining the DS from a streaming storage system (SSS);
performing a first analyzing, by the AE, of the DS to extract relevant data that is requested to be identified in the analytics job;
making, based on the first analyzing and by the AE, a first determination that the relevant data is worth indexing, wherein the relevant data is worth indexing because the relevant data specifies features requested to be identified in the analytics job;
in response to the first determination and by the AE, storing the relevant data in a database, wherein the AE, the SSS, and the database communicate over a network;
generating, based on the relevant data and by a database engine (DE) hosted by the database, a set of indexes corresponding to at least a portion of the DS;
receiving, by an application programming interface (API) service, a query about a data item in the DS from a user;
sending, by the API service, a second request to the DE comprising the query;
performing a second analyzing, by the DE, of the query;
making, based on the second analyzing and by the DE, a second determination that a re-indexing of the portion of the DS is not required;
in response to the second determination and by the DE, identifying an index from the set of indexes that is associated with the query;
sending, based on the index and by the DE, position information of the data item to the API service;
retrieving, based on the position information and by the API service, the data item from the SSS; and
initiating, by the API service, displaying of the data item to the user in response to the query.