US 11,715,564 B2
	Machine learning-based diagnostic classifier
Monika Sharma Mellem, Falls Church, VA (US); Yuelu Liu, South San Francisco, CA (US); Parvez Ahammad, San Jose, CA (US); Humberto Andres Gonzalez Cabezas, Santa Clara, CA (US); William J. Martin, San Francisco, CA (US); and Pablo Christian Gersberg, San Francisco, CA (US)
Assigned to NEUMORA THERAPEUTICS, INC., Brisbane, CA (US)
Filed by BlackThorn Therapeutics, Inc., San Francisco, CA (US)
Filed on May 1, 2019, as Appl. No. 16/400,312.
Claims priority of provisional application 62/665,243, filed on May 1, 2018.
Prior Publication US 2019/0341152 A1, Nov. 7, 2019
Int. Cl. G16H 50/30 (2018.01); G16H 50/20 (2018.01); G16H 10/60 (2018.01)

CPC G16H 50/30 (2018.01) [G16H 10/60 (2018.01); G16H 50/20 (2018.01)]

28 Claims

1. A system for screening the mental health of patients, the system comprising:

a display;

a microphone;

a camera positioned to capture an image in front of the display and configured to output video data;

a user interface;

a memory containing machine readable medium comprising machine executable code having stored thereon instructions for performing a method of evaluating the mental health of a user; and

a control system coupled to the memory comprising one or more processors, the control system configured to execute the machine executable code to cause the control system to:

execute a test application, by the control system, upon receiving, from the user interface, an indication to initiate a test; and

terminate the test application upon receiving, by the control system, an indication to stop the test;

wherein the test application comprises:

displaying, on the display, a set of text for the user to read aloud;

displaying, on the display, live video data recorded by the camera capturing the user while reading the set of text aloud;

recording, by the camera, a set of test video data;

recording, by the microphone, a set of test audio data;

processing the video data to assign a plurality of pixels the video data to the face of the user while reading the set of text aloud;

processing the plurality of pixels to output a set of video features comprising facial expressions of the user while reading the set of text aloud; and

processing the audio data to identify sounds representing the voice of the user while the user read the set of text aloud and output a set of audio features comprising tone of voice of the use while reading the set of text aloud;

processing, using a machine learning model, the set of video features, and the set of audio features to output a mental health indication of the user, wherein the machine learning model was generated by:

receiving labeled training data for a plurality of individuals indicating whether each of the plurality of individuals has one or more mental health disorders, the labeled training data comprising audio and video data recorded for each of the plurality of individuals while reading the set of text aloud; and

determining a plurality of features from the labeled training data;

training an initial machine learning model using a combination of the audio data and the video data of the labeled training data;

using the trained initial machine learning model to determine an importance value for each of the plurality of features;

generating a plurality of subset machine learning models based on the importance values for the plurality of features;

evaluating a classification performance of the generated plurality of subset machine learning models; and

selecting at least one of the subset machine learning models as the machine learning model.