US 12,436,920 B2
System and method for file type identification using machine learning
Joshua Cooper, Columbia, SC (US); Aliasghar Riahi, Orinda, CA (US); Mojgan Haddad, Orinda, CA (US); Ryan Kourosh Riahi, Orinda, CA (US); Razmin Riahi, Orinda, CA (US); and Charles Yeomans, Orinda, CA (US)
Assigned to ATOMBEAM TECHNOLOGIES INC., Moraga, CA (US)
Filed by AtomBeam Technologies Inc., Moraga, CA (US)
Filed on Aug. 14, 2023, as Appl. No. 18/449,617.
Application 18/449,617 is a continuation of application No. 17/994,359, filed on Nov. 27, 2022, granted, now 11,853,262.
Application 17/994,359 is a continuation of application No. 17/727,919, filed on Apr. 25, 2022, granted, now 11,609,881, issued on Mar. 21, 2023.
Application 17/727,919 is a continuation of application No. 17/501,872, filed on Oct. 14, 2021, granted, now 11,397,707, issued on Jul. 26, 2022.
Application 17/501,872 is a continuation in part of application No. 16/923,039, filed on Jul. 7, 2020, granted, now 11,232,076, issued on Jan. 25, 2022.
Application 16/923,039 is a continuation in part of application No. 16/716,098, filed on Dec. 16, 2019, granted, now 10,706,018, issued on Jul. 7, 2020.
Application 16/716,098 is a continuation of application No. 16/455,655, filed on Jun. 27, 2019, granted, now 10,509,771, issued on Dec. 17, 2019.
Application 16/455,655 is a continuation in part of application No. 16/200,466, filed on Nov. 26, 2018, granted, now 10,476,519, issued on Nov. 12, 2019.
Application 16/200,466 is a continuation in part of application No. 15/975,741, filed on May 9, 2018, granted, now 10,303,391, issued on May 28, 2019.
Claims priority of provisional application 63/232,030, filed on Aug. 11, 2021.
Claims priority of provisional application 63/027,166, filed on May 19, 2020.
Claims priority of provisional application 62/926,723, filed on Oct. 28, 2019.
Claims priority of provisional application 62/578,824, filed on Oct. 30, 2017.
Prior Publication US 2023/0385241 A1, Nov. 30, 2023
Int. Cl. G06F 16/174 (2019.01); G06F 3/06 (2006.01)
CPC G06F 16/1752 (2019.01) [G06F 3/0608 (2013.01); G06F 3/0641 (2013.01); G06F 3/067 (2013.01)] 8 Claims
OG exemplary drawing
 
1. A system for identifying a file type comprising:
a computing device comprising a processor, a memory, and a non-volatile data storage device;
a file-print extractor comprising a first plurality of programming instructions stored in the memory and operable on the processor, wherein the first plurality of programming instructions, when operating on the processor, causes the computing device to:
segment an entire file into groups of bytes;
generate a statistical file-print for file type identification of the entire file, the statistical file-print comprising a plurality of statistical characteristics of a distribution of the groups of bytes across the entire file, wherein the statistical file-print comprises at least a mean value and a variance value for the distribution of bytes in the file; and
a file classifier comprising a second plurality of programming instructions stored in the memory and operable on the processor, wherein the second plurality of programming instructions, when operating on the processor, causes the processor to;
process the statistical file-print through a trained machine learning classifier to identify a file type of the file;
wherein the trained machine learning classifier is specifically trained using a plurality of training datasets comprising statistical file-prints derived from files of known types; and
wherein the trained machine learning classifier determines a file type based on statistical patterns in the file-print that correspond to patterns previously identified in files of known type during training.