US 12,191,003 B2
Real-time prediction of chemical properties through combining calculated, structured and unstructured data at large scale
Richard L. Martin, Jamaica Plain, MA (US); and Sheng Hua Bao, San Jose, CA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Oct. 1, 2018, as Appl. No. 16/148,159.
Prior Publication US 2020/0104465 A1, Apr. 2, 2020
Int. Cl. G01N 33/48 (2006.01); G01N 33/50 (2006.01); G16C 20/30 (2019.01); G16C 20/70 (2019.01)
CPC G16C 20/30 (2019.02) [G16C 20/70 (2019.02)] 20 Claims
OG exemplary drawing
 
1. A method, in a data processing system comprising at least one processor and a memory comprising instructions which, when executed by the at least one processor, causes the at least one processor to implement a real-time prediction engine for real-time prediction of chemical properties through combining calculated, structured, and unstructured data at large scale, the method comprising:
inputting, for each of a plurality of chemical structures, into a chemical information processor, unstructured chemical features and properties extracted, by a natural language processing job server, from one or more unstructured chemical information sources and structured chemical features and properties extracted, by a structured data processor, from structured chemical information sources;
calculating, by the chemical information processor, for each of a plurality of chemical structures, calculated chemical structure features and properties based on the unstructured chemical features and properties, and the structured chemical features and properties;
storing, by offline components executing within the real-time prediction engine, a computational representation for each of the plurality of chemical structures in a unified storage, wherein each computational representation maps a respective chemical structure to a vector of corresponding calculated chemical structure features and properties, corresponding unstructured chemical features and properties, and corresponding structured chemical features and properties;
training, by the offline components using a machine learning training operation, a computational real-time predictive model based on the computational representations as inputs to the computational real-time predictive model, wherein the computational real-time predictive model is trained to predict properties based on an input chemical compound;
receiving, by a user interface executing within the real-time prediction engine, a request specifying one or more chemical compounds;
predicting, by an analytics jobs manager executing within the real-time prediction engine, one or more properties of the one or more chemical compounds using the computational real-time predictive model; and
outputting, by the analytics jobs manager, the one or more properties of the one or more chemical compounds to the user interface, wherein the computational real-time predictive model comprises a machine learning model, a deep learning model, or a neural network.