US 11,790,252 B2
Apparatus and method for preprocessing security log
Jang-Ho Kim, Seoul (KR); Young-Min Cho, Seoul (KR); Jung-Bae Jun, Seoul (KR); Seong-Hyeok Seo, Seoul (KR); and Jang-Mi Shin, Seoul (KR)
Assigned to SAMSUNG SDS CO., LTD., Seoul (KR)
Filed by SAMSUNG SDS CO., LTD., Seoul (KR)
Filed on Oct. 28, 2019, as Appl. No. 16/665,663.
Claims priority of application No. 10-2018-0130743 (KR), filed on Oct. 30, 2018.
Prior Publication US 2020/0134487 A1, Apr. 30, 2020
Int. Cl. G06N 5/04 (2023.01); G06N 20/00 (2019.01); G06F 16/25 (2019.01)
CPC G06N 5/04 (2013.01) [G06F 16/258 (2019.01); G06N 20/00 (2019.01)] 4 Claims
OG exemplary drawing
 
1. An apparatus for preprocessing a security log, comprising at least one hardware processor configured to implement:
a field divider configured to divide a character string of a security log into a plurality of fields on the basis of a structure of the security log;
an ASCII code converter configured to convert a character string included in each of the plurality of divided fields into ASCII codes;
a vector data generator configured to generate vector data for each of the plurality of divided fields using the converted ASCII codes, the vector data comprising the converted ASCII codes and a length of the character string included in each of the plurality of divided fields; and
a learning server configured to train a machine learning-based prediction model to predict an intrusion using the vector data,
wherein the ASCII code converter is configured to convert a predetermined character among a plurality of characters included in the character string into a weighted ASCII code, wherein the predetermined character is a character used in an attack script included in the security log,
wherein a dimension of the vector data is determined based on a set maximum length of a character string for each of the plurality of divided fields,
wherein a value obtained by adding 1 to the set maximum length is determined to be the dimension of the vector data.