US 11,783,029 B2
Methods and apparatus to improve feature engineering efficiency with metadata unit operations
Chih-Yuan Yang, Portland, OR (US); and Yi Gai, Hillsboro, OR (US)
Assigned to Intel Corporation, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Jan. 4, 2021, as Appl. No. 17/140,797.
Application 17/140,797 is a continuation of application No. 16/805,159, filed on Feb. 28, 2020, granted, now 10,915,627.
Application 16/805,159 is a continuation of application No. 15/280,044, filed on Sep. 29, 2016, granted, now 10,607,004, issued on Mar. 31, 2020.
Prior Publication US 2021/0200863 A1, Jul. 1, 2021
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/55 (2013.01); G06F 21/52 (2013.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01)
CPC G06F 21/552 (2013.01) [G06F 21/52 (2013.01); G06N 20/10 (2019.01); G06N 20/00 (2019.01)] 21 Claims
OG exemplary drawing
 
1. An apparatus comprising:
interface circuitry to receive a plurality of files from a plurality of devices different than the apparatus;
machine readable instructions; and
one or more processor circuits to execute the machine readable instructions to:
determine respective first formats of the plurality of files, the plurality of files to be used to create a plurality of vector output files;
convert the plurality of files from the respective first formats to a second format, conversion of respective files based on the determination of the respective first formats of the plurality of files;
extract respective features from the respective files of the plurality of files, the respective files in the second format;
identify at least one respective group of contiguous characters in the respective features;
create the plurality of vector output files, respective vector output files including columns, respective columns including at least one number representative of an occurrence of the respective features; and
cause a machine learning algorithm to detect malware observed in at least one file of the plurality of files by outputting the plurality of vector output files to the machine learning algorithm, the plurality of vector output files formatted to be processed by the machine learning algorithm, the machine learning algorithm to analyze the respective features to detect the malware.