US 12,073,562 B2
	Medical device for transcription of appearances in an image to text with machine learning
Marc Jean Baptist Van Oldenborgh, Amsterdam (NL); Henricus Meinardus Gerardus Stokman, Amsterdam (NL); and Ran Tao, Amsterdam (NL)
Assigned to Kepler Vision Technologies B.V., Amsterdam (NL)
Filed by Kepler Vision Technologies B.V., Amsterdam (NL)
Filed on May 11, 2023, as Appl. No. 18/315,895.
Application 18/315,895 is a continuation of application No. 16/975,015, granted, now 11,688,062, previously published as PCT/EP2020/058034, filed on Mar. 23, 2020.
Claims priority of application No. 19164480 (EP), filed on Mar. 21, 2019.
Prior Publication US 2023/0281813 A1, Sep. 7, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06T 7/00 (2017.01); G16H 10/60 (2018.01); G16H 30/20 (2018.01); G16H 30/40 (2018.01)

CPC G06T 7/0012 (2013.01) [G16H 10/60 (2018.01); G16H 30/20 (2018.01); G16H 30/40 (2018.01); G06T 2207/20081 (2013.01); G06T 2207/20084 (2013.01); G06T 2207/30004 (2013.01); G06T 2207/30196 (2013.01)]

12 Claims

1. A medical device configured to transcribe an appearance of a human being, said device comprising:

a computing device comprising a data processor, and

a computer program product for running on the computing device, and when running on said data processor:

receives at least one image from said image capturing sensor;

analyzes said at least one image, the analyzing comprises:

subjecting said at least one image to said first machine learning model;

detecting presence of a living being in said at least one image;

labeling the detected living being in said at least one image using a label;

subjecting at least a part of said at least one image, said part of said at least one image comprising the labeled living being, to said second machine learning model;

retrieving said appearance of said labeled living being from said second machine learning model;

applying said transcription module to transcribe the retrieved appearance of said labeled living being to text, and

outputting said text;

said medical device comprising a common housing holding:

an image capturing sensor;

the computing device comprising a data processor, and

the computer program product

the computer program product comprising:

a first machine learning model trained for detecting and labeling human beings in at least one image;

a second machine learning model trained for detecting appearances of human beings in at least one image;

a transcription module to transcribe the detected appearances of human beings to text,

wherein said computer program product when running on said data processor causes said computing device to:

retrieve at least one image from said image capturing sensor;

analyze said at least one image, the analyzing comprises:

input said at least one image to said first machine learning model;

said first machine learning model detecting presence of a human being in said at least one image;

said first machine learning model labeling the detected human being in said at least one image using a label;

input at least a part of said at least one image to said second machine learning model, said part of said at least one image comprising the labeled human being, and

said second machine learning model providing said appearance of said labeled human being as an output;

apply said transcription module to transcribe the retrieved appearance of said labeled human being to text and outputs said text, wherein the transcription to text in said transcription module involves creating a medical record and output said text into said medical record;

wherein said second machine learning model comprising:

a first deep neural network which captures the skeleton data of said human being in said at least a part of said at least one image, said first deep neural network using said at least a part of said at least one image as an input and outputs said skeleton data;

a second deep neural network which captures a first appearance of said human being, said second deep neural network using said skeleton data from said first deep neural network as an input and outputs said first appearance in first appearance data;

a third deep neural network which captures a second appearance of said human being in said at least a part of said at least one image, said third deep neural network using said at least a part of said at least one image as an input and outputs said second appearance in second appearance data, and

a fourth deep neural network which captures a third appearance of said human being using said first and second appearance data as an input and outputs third appearance data, said third appearance data comprising a prediction of probabilities of said appearance.