US 11,687,548 B2
Storage of backup data using a time-series data lake
Abdul Jabbar Abdul Rasheed, San Jose, CA (US); Woonho Jung, Cupertino, CA (US); Xia Hua, Mountain View, CA (US); Douglas Qian, Mountain View, CA (US); Rajeev Kumar, Sunnyvale, CA (US); Lawrence Chang, San Jose, CA (US); Karan Dhabalia, Mountain View, CA (US); John Stewart, San Jose, CA (US); and Rolland Miller, Olathe, KS (US)
Assigned to Clumio, Inc., Sunnyvale, CA (US)
Filed by Clumio, Inc., Sunnyvale, CA (US)
Filed on Feb. 26, 2021, as Appl. No. 17/187,286.
Claims priority of provisional application 62/982,970, filed on Feb. 28, 2020.
Prior Publication US 2021/0271567 A1, Sep. 2, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/00 (2019.01); G06F 16/25 (2019.01); G06F 16/245 (2019.01); G06F 16/2455 (2019.01); G06F 11/14 (2006.01); H04L 67/1097 (2022.01); G06F 16/23 (2019.01)
CPC G06F 16/254 (2019.01) [G06F 11/1435 (2013.01); G06F 11/1451 (2013.01); G06F 11/1461 (2013.01); G06F 11/1464 (2013.01); G06F 11/1469 (2013.01); G06F 16/2379 (2019.01); G06F 16/245 (2019.01); G06F 16/2455 (2019.01); H04L 67/1097 (2013.01); G06F 2201/80 (2013.01); G06F 2201/84 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A method, comprising:
providing, by one or more servers, a cloud-based data lake service that maintains data for a plurality of different data sources in a time-series data lake that stores a time series representation of data from the plurality of different data sources;
receiving, by the one or more servers, physical backup data that includes multiple backup images from different data sources in different formats;
converting, by the one or more servers, the physical backup data to logical backup data, wherein the converting includes:
extracting backup data from a given backup image of the physical backup data;
analyzing the backup data to determine backup metadata included in the physical backup data;
determining enriched metadata that was not included in the physical backup data, including:
source metadata that identifies a data source from which the backup image originated; and
access control information that identifies users with access to one or more of a plurality of data records;
enhancing the extracted backup data with the backup metadata and the enriched to create the logical backup data in a common backup format;
storing, by the one or more servers, the logical backup data corresponding to the multiple backup images in the time-series data lake using a column-oriented format;
receiving a request from a requesting computer system for a particular view of the logical backup data;
generating, by the one or more servers in response to the request, the particular view of the logical backup data using the backup metadata and the enriched metadata to query the logical backup data, the particular view including:
subsets of data from multiple different data sources of the plurality of different data sources with different original formats; and
subsets of data from multiple different backup images for a given data source; and
providing, by the one or more servers, the particular view of the logical backup data to the requesting computer system.