| CPC G06F 21/6218 (2013.01) [G06F 21/602 (2013.01); G06F 21/6227 (2013.01); G06F 21/6245 (2013.01); H04L 63/20 (2013.01)] | 4 Claims |

|
1. An apparatus for reducing a storage of confidential documents within a centralized data repository and increasing security within the centralized data repository, wherein the centralized data repository is an enterprise content management system used to manage and store documents and files, the apparatus comprising:
a computing device;
a processor within the computing device;
a globally accessible site;
wherein the globally accessible site is received on the computing device from a Data at Rest (DAR) team;
said globally accessible site containing files within the centralized data repository;
executing, using the processor on the computing device, a scanning application on the files within the centralized data repository;
identifying, using the processor, whether the files contain a first comma separated values (CSV) file; and
when the files contain a first CSV file, the apparatus:
stores the first CSV file in a local memory address, the local memory address included in the centralized data repository where the first CSV file is being stored;
extracts metadata from the first CSV file;
sets script arguments comprising scan, env, start_date, and end_date;
automatically passes a Unified User Management (UUM) key, service ID, and password into a UUM Authentication Service;
automatically outputs a UUM Session Key from the UUM Authentication Service;
automatically creates a Simple Object Access Protocol (SOAP) Application Programming Interface (API) Client using the UUM key to establish a connection to a Content Management Interoperability Services (CMIS) discovery and object service of the centralized data repository;
creates a dictionary containing a sitelist of Uniform Resource Locators (URLs) as keys and an empty list as values;
develops a query comprising a document ID, document name, a last modified by, and a true file path to form a base query;
automatically creates a date range determined by a scan type of the scan script argument, the scan type comprising standard and ad hoc; and
automatically runs a query statement to the CMIS discovery and object service for each key in the dictionary, using the sitelist of URLs, the creation date, and the base query; and
when the files do not contain a first CSV file, the apparatus:
manually emails the DAR team for instruction;
and, following the query statement, the apparatus:
outputs a second CSV file using the processor, the second CSV file being an outcome of the scanning application;
identifies, using the processor, whether the URL contains documents; and
when the URL contains documents, the apparatus:
stores the documents in a new memory address, the new memory address included in the centralized data repository where the documents are being stored;
appends the CSV file metadata to the dictionary;
calls an object service operation for each key in the dictionary to retrieve a content stream;
converts the content stream into a file object;
transfers the file object into a multiprotocol fileshare;
converts the contents of each item in the dictionary into CSV format;
writes the converted contents of each item in the dictionary into the second CSV file; and
manually emails the second CSV file to the DAR team; and
when the URL does not include documents, the apparatus:
removes the key and URL from the dictionary.
|