US 11,893,038 B2
Data type based visual profiling of large-scale database tables
Dilyan Kovachev, Bridgeport, CT (US); Pradeep Kumar Reddy Savadi, Sunnyvale, CA (US); and Gurbaksh Sharma, Karnal (IN)
Assigned to Treasure Data, Inc., Mountain View, CA (US)
Filed by Treasure Data, Inc., Mountain View, CA (US)
Filed on Dec. 3, 2021, as Appl. No. 17/541,494.
Claims priority of application No. 202111047901 (IN), filed on Oct. 21, 2021.
Prior Publication US 2023/0129763 A1, Apr. 27, 2023
Int. Cl. G06F 16/26 (2019.01); G06F 16/242 (2019.01); G06F 16/2452 (2019.01); G06F 16/2453 (2019.01); G06F 16/2458 (2019.01)
CPC G06F 16/26 (2019.01) [G06F 16/2445 (2019.01); G06F 16/2458 (2019.01); G06F 16/24522 (2019.01); G06F 16/24545 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method, comprising:
using a first computer, establishing programmatic connections to a digitally stored first database comprising over one million records, each of the records comprising a plurality of columns, the first database being part of a HADOOP cluster that is programmatically coupled to a HIVE data warehouse manager and a PRESTO query engine;
using the first computer, reading a configuration file that specifies a plurality of tables in the first database;
using the first computer, for each particular table among the plurality of tables, forming and submitting a plurality of PRESTO queries to the first database, each of the PRESTO queries specifying one or more data aggregation operations, and in response thereto, receiving a plurality of result sets of records of the first database;
using the first computer, calculating a plurality of metadata metrics that characterize columns of the records in the result sets and storing the metadata metrics respectively in separate tables for VARCHAR column statistics, NUMERIC column statistics, DATE column statistics, based upon a particular data type among a plurality of different data types of the columns of the records in the result sets; and
using the first computer, generating presentation instructions which when rendered using a computer display device cause displaying one or more graphical visualizations in a graphical user interface of the computer display device.