US 11,727,029 B2
Systems and methods for data indexing with user-side scripting
David Sitsky, Garran (AU); and Edward Sheehy, Willoughby (AU)
Assigned to Nuix Limited, Sydney (AU)
Filed by Nuix Limited, Sydney (AU)
Filed on Nov. 5, 2021, as Appl. No. 17/519,855.
Application 17/519,855 is a continuation of application No. 15/131,764, filed on Apr. 18, 2016, granted, now 11,200,249.
Claims priority of provisional application 62/148,586, filed on Apr. 16, 2015.
Prior Publication US 2022/0058203 A1, Feb. 24, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/25 (2019.01); G06F 8/30 (2018.01); G06F 9/455 (2018.01); H04L 9/40 (2022.01); G06F 9/445 (2018.01)
CPC G06F 16/254 (2019.01) [G06F 8/30 (2013.01); G06F 9/445 (2013.01); G06F 9/44505 (2013.01); G06F 9/45512 (2013.01); H04L 63/1408 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method for extraction and selectively indexing information, the method implemented on an indexing computer system having a first processor and a user-facing application program interface (API), the method comprising:
receiving, via the API, parameters from a user-script executed by a user computer system, the user-script runs at pre-determined times and calls a third party program that transforms original data saved on the user computer system into target data, wherein the parameters identify, to the indexing computer system, the target data in a container file;
running, on the first processor of the indexing computer system, a first process to:
(i) extract a first file from the container file,
(ii) determine that the first file includes an embedded second file, and
(iii) determine that the first process will not process the embedded second file;
iteratively initiating one or more instances of a secondary process, each respective secondary process executing on a corresponding secondary processor and in parallel with the first process, the each respective secondary process operable to:
search the embedded second file as the embedded second file is discovered embedded within a respective first file to identify, for each such embedded second file, a respective additional embedded file within the embedded second file;
determine that the secondary process will not process the additional embedded files; and
initiate subsequent instances of the secondary process to process the additional embedded files, until all the second embedded files and their respective additional embedded files have been extracted from the container file;
each of the embedded second file and their respective additional embedded file being an extracted second file, and for each of the extracted second file,
identifying the target data within the extracted second file; and
determining that the extracted second file satisfies a criterion; and
performing an operation on the target data.