US 12,446,600 B2
Two-stage sampling for accelerated deformulation generation
Selami Emre Sevgen, Chicago, IL (US); Brendan David Folie, San Francisco, CA (US); and Julia Black Ling, Redwood City, CA (US)
Assigned to Citrine Informatics, Inc., Redwood City, CA (US)
Filed by Citrine Informatics, Inc., Redwood City, CA (US)
Filed on Jan. 19, 2022, as Appl. No. 17/578,759.
Claims priority of provisional application 63/141,723, filed on Jan. 26, 2021.
Prior Publication US 2022/0232863 A1, Jul. 28, 2022
Int. Cl. A23L 5/00 (2016.01); G06F 18/214 (2023.01); G06F 18/2415 (2023.01); G06N 3/08 (2023.01)
CPC A23L 5/00 (2016.08) [G06F 18/2155 (2023.01); G06F 18/2415 (2023.01); G06N 3/08 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A method comprising:
receiving an ingredient list comprising a sequence of ingredients, the sequence establishing a relative amount of each ingredient based on where each ingredient falls within the sequence;
generating a plurality of formulation vectors, each formulation vector derived by performing a random sampling of the ingredients list that indicates a relative share of known candidate ingredients that may be present in the ingredient list;
inputting the plurality of formulation vectors into a machine-learned model, the machine-learned model generating an encoded version of each of the plurality of formulation vectors using an encoder, and then outputting a plurality of reconstructed formulation vectors as derived using a decoder;
identifying a subset of the plurality of reconstructed formulation vectors that have an order that matches the sequence;
defining a latent space using the encoded version of the subset of reconstructed formulation vectors;
iteratively sampling the latent space until a threshold number of samples are derived that match an ordering constraint that corresponds to the sequence, wherein the threshold number of samples is a first threshold number of samples smaller than a second threshold number of samples;
responsive to determining that the first threshold is satisfied:
defining a constrained latent space using encoded versions of respective ones of the plurality of formulation vectors that matched the ordering constraint corresponding to the first threshold number of samples, the constrained latent space being a narrow subset of the latent space; and
iteratively sampling the constrained latent space until a second number of samples that satisfy the second threshold are derived that match a second ordering constraint that corresponds to the sequence;
performing a statistical aggregation on the samples of the second number of samples; and
outputting an indication of an absolute amount of each ingredient in the ingredients list based on the statistical aggregation.