| CPC G06F 17/18 (2013.01) [G06F 17/16 (2013.01); G06F 18/211 (2023.01); G16B 50/30 (2019.02); G16C 20/70 (2019.02)] | 15 Claims |

|
1. A computer-implemented method for analyzing data obtained with respect to a chemical and/or biological process, the method comprising:
obtaining a result of statistical data analysis on a data set including the data obtained with respect to the chemical and/or biological process, wherein:
the chemical and/or biological process is carried outin a plurality of batch processes having a finite duration,
values of process parameters relating to the chemical and/or biological process are obtained at a plurality of time points during each of the plurality of batch processes, each time point in one of the plurality of batch processes having corresponding time points in other ones of the plurality of batch processes, wherein the process parameters comprise temperature, pressure, pH, agitation, flow of a gas, flow of a liquid, or an amount of a particular substance,
at least one process output value is obtained at or after an end of each of the plurality of batch processes, wherein the output value comprises yield in a chemical process, titer in a biological process, or amount of a by-product,
the data set includes, for each of the plurality of batch processes, the values of the process parameters and the at least one process output value,
the data set serves as a training set for a model stored in one or more non-transitory computer-readable media and configured to receive the values of the process parameters and predict the at least one process output value comprising yield in a chemical process, titer in a biological process, or amount of a by-product; and
the result of the statistical data analysis includes, for the values of each process parameter obtained at each group of corresponding time points during the plurality of batch processes, a correlation value and a confidence value corresponding to a confidence interval of the correlation value, the correlation value indicating a correlation between:
the values of the process parameter obtained at the group of corresponding time points during the plurality of batch processes, and
the at least one process output value comprising yield in a chemical process, titer in a biological process, or amount of a by-product predicted using the model;
calculating, for the values of each process parameter obtained at each group of corresponding time points during the plurality of batch processes, a ratio of the correlation value to the confidence value;
calculating, for each of the process parameters, an average of absolute values of the ratios calculated for the values of the process parameter obtained at different groups of the corresponding time points during the plurality of batch processes;
excluding, from the data set serving as the training set for the model, the values of one of the process parameters having a smallest one of the averages calculated for the process parameters;
iterating, until at least one specified condition is met, the steps of obtaining the result of the statistical data analysis, calculating the ratio of the correlation value to the confidence value, calculating the average of the absolute values of the ratios and excluding the values of the one of the process parameters from the data set serving as the training set for the model, thereby reducing the data set serving as the training set for the model;
with one or more computing devices, training the model with the reduced data set serving as the training set for the model, wherein the excluded values are not included as part of the training, thereby reducing a size of the data set stored in the one or more non-transitory computer-readable media by the one or more computing devices and serving as the training set for the model; and
controlling a subsequent iteration of the chemical and/or biological process based on the model trained with the reduced data set.
|