US 12,468,898 B2
	Mathematical reasoning using large language models
Shima Imani, Sammamish, WA (US); Harsh Shrivastava, Redmond, WA (US); and Liang Du, Redmond, WA (US)
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed by MICROSOFT TECHNOLOGY LICENSING, LLC, Redmond, WA (US)
Filed on May 8, 2023, as Appl. No. 18/144,802.
Claims priority of provisional application 63/488,159, filed on Mar. 2, 2023.
Prior Publication US 2024/0296294 A1, Sep. 5, 2024
Int. Cl. G06F 40/40 (2020.01); G06F 16/332 (2019.01); G06F 16/3329 (2025.01)

CPC G06F 40/40 (2020.01) [G06F 16/3325 (2019.01); G06F 16/3329 (2019.01)]

15 Claims

1. A method for an artificial intelligence (AI) system with a large language model (LLM) to solve a mathematical problem, the method comprising:

receiving an initial query that presents a problem with original input values;

creating key-value mappings between the original input values and variables;

transforming the initial query into a template query by replacing the original input values with the variables;

sending multiple prompts to the LLM, wherein each of the multiple prompts is different and contextually related to the template query;

responsive to the multiple prompts, receiving multiple results from the LLM, wherein each of the multiple results includes an analytical expression to solve the mathematical problem;

evaluating outputs of the analytical expressions included in the multiple results with the variables being assigned to a common set of randomly sampled values, wherein evaluating the outputs of the analytical expressions included in the multiple results comprises:

looping through a process over a number of trials, the process comprising:

assigning random values to the variables;

evaluating each of the analytical expressions with the variables having the random values assigned;

calculating a consensus rating based on the evaluating each of the analytical expressions with the variables having the random values assigned;

determining if additional trials are required based on the consensus rating and a test condition; and

terminating the looping when the consensus rating and the test condition indicate that the additional trials are not required; and

outputting final results based on the consensus rating and the test condition indicating that the additional trials are not required.