US 11,809,817 B2
Methods and systems for time-series prediction under missing data using joint impute and learn technique
Avinash Achar, Chennai (IN); and Soumen Pachal, Chennai (IN)
Assigned to TATA CONSULTANCY SERVICES LIMITED, Mumbai (IN)
Filed by Tata Consultancy Services Limited, Mumbai (IN)
Filed on Dec. 27, 2022, as Appl. No. 18/146,863.
Prior Publication US 2023/0297770 A1, Sep. 21, 2023
Int. Cl. G06F 17/00 (2019.01); G06F 40/177 (2020.01); G06N 3/0499 (2023.01); G06N 3/063 (2023.01); G06N 3/044 (2023.01); G06N 3/045 (2023.01)
CPC G06F 40/177 (2020.01) [G06N 3/044 (2023.01); G06N 3/045 (2023.01); G06N 3/0499 (2023.01); G06N 3/063 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A processor implemented method, comprising:
receiving, by a missing data prediction system (MDPS) via one or more hardware processors, time-series data, the time-series data comprising one or more time dependent variables, each time dependent variable of the one or more time dependent variables comprising one or more data values, each data value of the one or more data values comprising one of: a single data value, and a missing entry;
arranging, by the MDPS via the one or more hardware processors, the one or more data values received with the time-series data in a plurality of cells of a table;
determining, by the MDPS via the one or more hardware processors, one or more cells in the plurality of cells that have the missing entry;
for each cell of the one or more cells that has the missing entry, performing:
determining, by the MDPS via the one or more hardware processor, a current position of each cell in the table;
selecting, by the MDPS via the one or more hardware processor, a left cell and a right cell for each cell, wherein the left cell is selected for each cell if it is present in closest left side of the current position of each cell and contains the data value, wherein the right cell is selected for each cell if it is present in closest right side of the current position of each cell and contains the data value;
accessing, by the MDPS via the one or more hardware processors, a left data value from the left cell of each cell and a right data value from the right cell of each cell;
calculating, by the MDPS via the one or more hardware processors, a left gap length for each cell based on current position of the left cell of each cell and the respective cell, and a right gap length for each cell based on current positions of the right cell of each cell and the respective cell;
determining, by the MDPS via the one or more hardware processors, a mean value for each cell by computing a mean of at least one data value present in a row comprising the respective cell using a mean calculation formula;
providing, by the MDPS via the one or more hardware processors, the left gap length and the right gap length calculated for each cell to a first feed-forward neural network to obtain an importance of the left data value, the right data value and the mean value determined for each cell;
passing, by the MDPS via the one or more hardware processors, the importance obtained for each cell by the first feed-forward neural network to a SoftMax layer to obtain a probability distribution for the respective cell, wherein the probability distribution for each cell comprises three components, and wherein the three components comprise a left component, a right component, and a mean component; and
calculating, by the MDPS via the one or more hardware processors, a new data value for each cell based, at least in part, on the three components, the left data value, the right data value and the mean value obtained for the respective cell using a predefined formula;
substituting, by the MDPS via the one or more hardware processors, each cell of the one or more cells that has the missing entry with the new data value calculated for the respective cell to obtain an updated table; and
creating, by the MDPS via the one or more hardware processors, a new time-series data based on the updated table.