US 11,789,909 B2
System and method for automatically managing storage resources of a big data platform
Dan Grebenisan, Whitby (CA); Yue Ma, Mississauga (CA); Peter Sykora, Aurora (CA); Gordon Manway Lam, Richmond Hill (CA); Sarvjot Kaur Kang, Toronto (CA); and Sai Macherla, Toronto (CA)
Assigned to THE TORONTO-DOMINION BANK, Toronto (CA)
Filed by THE TORONTO-DOMINION BANK, Toronto (CA)
Filed on Oct. 19, 2022, as Appl. No. 17/969,453.
Application 17/969,453 is a continuation of application No. 16/829,713, filed on Mar. 25, 2020, granted, now 11,507,622.
Prior Publication US 2023/0046875 A1, Feb. 16, 2023
Int. Cl. G06F 7/00 (2006.01); G06F 16/182 (2019.01); G06F 16/901 (2019.01); G06F 16/906 (2019.01); G06N 20/00 (2019.01); G06F 18/214 (2023.01)
CPC G06F 16/182 (2019.01) [G06F 16/906 (2019.01); G06F 16/9017 (2019.01); G06F 16/9027 (2019.01); G06F 18/214 (2023.01); G06N 20/00 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A computer implemented method for automatically managing storage resources of a distributed file system, the method comprising:
obtaining actual past storage usage data of a first directory from a plurality of directories of the distributed file system extending from a past time to a current time;
detecting a space quota limit for the first directory, the space quota limit defining a maximum limit on total storage for the first directory and associated with a pre-defined expected future time defining a maximum time for expecting use of resources of the first directory;
determining, in real-time via a machine learning model, projected storage usage data of the first directory representing a projected storage usage for the first directory over a future time period and as a function of at least one of: a first derivative of a curve representing the actual past storage usage data projected to at least the expected future time, the first derivative being a rate of change of the projected storage usage over time; and a first derivative of a moving average of the curve projected to at least the expected future time;
obtaining an aggregated correction coefficient providing an indication of aggregated projected storage usage needs of all other remaining distributed file system directories from the plurality of directories, relative to the projected storage usage data of the first directory; and
in response to determining an expected value of the projected storage usage data at the expected future time is inconsistent with the space quota limit, adjusting the space quota limit for the first directory to a new quota limit based on the expected value weighted by the aggregated correction coefficient;
wherein weighting by the aggregated correction coefficient is further based upon an obtained value for total disk storage availability of a cluster defined by the plurality of directories of the distributed file system, the total disk storage availability indicating total amount of disk storage currently available for use by the plurality of directories and indicative of degree of possible change between the space quota limit and the new quota limit.