US 12,131,343 B2
Pre-processing financial market data prior to machine learning training
Ari L. Studnitzer, Northbrook, IL (US); David John Geddes, Antrim (GB); Inderdeep Singh, Palatine, IL (US); Steven Hutt, Sutton (GB); and Bernard Pieter Hosman, Amsterdam (NL)
Assigned to Chicago Mercantile Exchange Inc., Chicago, IL (US)
Filed by Chicago Mercantile Exchange Inc., Chicago, IL (US)
Filed on Jun. 1, 2023, as Appl. No. 18/204,526.
Application 18/204,526 is a continuation of application No. 15/642,038, filed on Jul. 5, 2017, granted, now 11,704,682.
Claims priority of provisional application 62/359,007, filed on Jul. 6, 2016.
Prior Publication US 2023/0306449 A1, Sep. 28, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06Q 30/0201 (2023.01); G06F 16/28 (2019.01); G06N 3/044 (2023.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06Q 40/04 (2012.01); G06Q 40/06 (2012.01)
CPC G06Q 30/0201 (2013.01) [G06F 16/285 (2019.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 40/04 (2013.01); G06Q 40/06 (2013.01)] 23 Claims
OG exemplary drawing
 
1. A computer system comprising:
a processor;
a tangible computer-readable medium containing computer-executable instructions that when executed by the processor cause the processor to:
receive, from a client computer via an electronic communication network, a data set comprising a plurality of data records each including data indicative of a time stamp, a level, and a quantity, the data set characterized by a first size;
determine, for each time stamp of each data record of the data set, a difference in the quantity at each level when compared to the quantity of the data record comprising data indicative of the same level at a prior time stamp;
arrange the data set into a sequence of time period windows of a selected adjustable length sufficient to encompass one of a pattern or structure within the data set;
determine quantiles for changes in the quantities;
divide the determined differences into predefined portions, each of which is characterized by one of a plurality of categories, each category being assigned to the time period window in accordance with the predefined portions and the determined quantiles;
generate a new pre-processed data set comprising the sequence of time period windows, wherein each data record of the new pre-processed data set includes a vector encoding of the plurality of categories representative of each price level and time therein, the new pre-processed data set characterized by a second size less than the first size; and
transmit the new pre-processed data set as input to a computer system, wherein, upon receipt of the new processed data set, the computer system executes a machine learning algorithm, wherein the execution of the machine learning algorithm includes training a recurrent neural network to identify the structure in the new pre-processed data set and executing a lossy encoded compression to compress the sequence of time period windows to provide a feature mapping from the sequence of time period windows to a feature space, wherein the lossy encoded compression of the sequence removes noise from the sequence of time period windows while retaining unique features of the feature space.