| CPC G06F 40/295 (2020.01) [G06N 5/022 (2013.01)] | 20 Claims |

|
1. A method for preparing unstructured data for machine learning analysis, the method comprising:
receiving, by one or more processors, data representing a plurality of processes;
analyzing, by the one or more processors, the data to identify, for each process of the plurality of processes, a time-ordered sequence of events that occurred during the process;
generating, by the one or processors, a plurality of emoji sequences by, for each process of the plurality of processes, generating an emoji sequence, each emoji in the emoji sequence representing an event of the events that occurred during the process, and the emoji sequence ordered in accordance with the time-ordered sequence;
generating, by the one or more processors, graphical representations of the plurality of emoji character sequences, the graphical representations including information about the order in which emojis occur in each emoji sequence;
generating, by the one or more processors, a plurality of feature vectors corresponding to the respective plurality of emoji sequences, the plurality of feature vectors including information based on the graphical representations; and
applying, by the one or more processors, a machine learning technique to the plurality of feature vectors.
|