| CPC G06F 16/119 (2019.01) [G06F 16/182 (2019.01); G06N 3/084 (2013.01)] | 20 Claims |

|
1. A computer-implemented method comprising:
generating a cross-platform metadata database comprising a plurality of file identifiers and comprising metadata for a plurality of digital files stored across a plurality of digital repository platforms storing machine learning data;
generating a cross-platform file location database comprising the plurality of file identifiers and comprising a plurality of file locations for the plurality of digital files across the plurality of digital repository platforms storing machine learning data;
receiving, via one or more servers from a requestor device, a machine learning dataset request comprising one or more characteristics for a machine learning dataset;
determining, via the one or more servers, file identifiers for a set of digital files for the machine learning dataset by searching, the metadata of the cross-platform metadata database utilizing the one or more characteristics from the machine learning dataset request;
searching, via the one or more servers, the plurality of file identifiers of the cross-platform file location database utilizing the file identifiers for the set of digital files for the machine learning dataset determined from the cross-platform metadata database to identify digital storage locations corresponding to the plurality of digital repository platforms for the set of digital files for the machine learning dataset by:
identifying a first storage location for a first digital file stored at a first digital repository platform utilizing a first file identifier; and
identifying a second storage location for a second digital file stored at a second digital repository platform utilizing a second file identifier;
generating a machine learning dataset response, for the requestor device, indicating the digital storage locations of the set of digital files for the machine learning dataset, the machine learning dataset response comprising the first storage location for the first digital file stored at the first digital repository platform and the second storage location for the second digital file stored at the second digital repository platform; and
training a machine learning model utilizing the set of digital files for the machine learning dataset.
|