US 12,086,716 B1
	Method for constructing multimodality-based medical large model, and related device thereof
Weihua Liu, Changsha (CN); Jianhua Qiu, Changsha (CN); and Jinmin Ma, Changsha (CN)
Assigned to AthenaEyes CO., LTD., Changsha (CN)
Filed by AthenaEyes CO., LTD., Changsha (CN)
Filed on Nov. 13, 2023, as Appl. No. 18/388,859.
Claims priority of application No. 202310596917.1 (CN), filed on May 25, 2023.
Int. Cl. G06N 3/08 (2023.01); G16H 50/70 (2018.01)

CPC G06N 3/08 (2013.01) [G16H 50/70 (2018.01)]

20 Claims

1. A method for constructing a multimodality-based medical large model, wherein the multimodality-based medical large model comprises a multimodal transformer T, a prompt manager M, a dialogue engine L, a task controller H, and a multimodal foundation (MMF); the MMF comprises at least one medical foundation model (MFM); the MFM comprises basic models for a downstream task, and a medical language module (MLM); and the method for constructing the multimodality-based medical large model comprises:

in a first stage, modal analysis: performing, by the multimodal transformer T, modal analysis on input query information to determine a task type;

in a second stage, model allocation: selecting, by the task controller H and the prompt manager M, a model Pp and a corresponding parameter hPp for task processing, and allocating a resource corresponding to the task type to the model Pp, wherein the model Pp is one of the basic models for the downstream task, P denotes a number of models, p denotes a task model set to which the selected model belongs, and the parameter hPp denotes a task family selected by a task handler H;

in a third stage, downstream task result feedback: executing, by the MFM, the model Pp to generate a task output result oPp, and feeding back the task output result oPp to the MMF;

in a fourth stage, modal transformation normalization: extracting, by the MLM, an entity-related text span from the task output result oPp sent by the MMF to acquire a structured entity; annotating the structured entity to generate a feedback text; and feeding back the feedback text to the MMF; and

in a fifth stage, response generation: receiving, by the MLM, a query result transmitted by the MMF, generating a professional response corresponding to the query result based on a medical knowledge base, and feeding back the professional response to a user;

wherein the step of performing, by the multimodal transformer T, the modal analysis on the input query information to determine the task type comprises:

transforming, by the multimodal transformer T, the input query information into a query description q_n^(d)and a set of query-related resources {q_n^(s^₁⁾, q_n^(s^₂⁾, . . . , q_n^(s^_k⁾} of size k, wherein the query description q_n^(d)is in vectorized form, and the input query information is in form of at least one of text, audio, and image; and s₁to s_kdenote serial numbers of the query-related sources; and

performing a modal check on the query description q_n^(d)to determine the task type.