US 11,699,508 B2
Method and apparatus for selecting radiology reports for image labeling by modality and anatomical region of interest
Marina Bendersky, Cupertino, CA (US); Tanveer Fathima Syeda-Mahmood, Cupertino, CA (US); and Joy Tzung-yu Wu, San Jose, CA (US)
Assigned to MERATIVE US L.P., Ann Arbor, MI (US)
Filed by MERATIVE US L.P., Ann Arbor, MI (US)
Filed on Dec. 2, 2019, as Appl. No. 16/700,137.
Prior Publication US 2021/0166822 A1, Jun. 3, 2021
Int. Cl. G16H 15/00 (2018.01); G16H 50/70 (2018.01); G06N 20/00 (2019.01); G06N 5/04 (2023.01); G16H 70/00 (2018.01); G16H 30/40 (2018.01); G06F 18/2431 (2023.01); G06V 10/75 (2022.01)
CPC G16H 15/00 (2018.01) [G06F 18/2431 (2023.01); G06N 5/04 (2013.01); G06N 20/00 (2019.01); G06V 10/75 (2022.01); G16H 30/40 (2018.01); G16H 50/70 (2018.01); G16H 70/00 (2018.01)] 16 Claims
OG exemplary drawing
 
1. A method for developing a classification model, the method comprising:
selecting, from a corpus of reports, a subset of the reports from which to form a training set and a testing set;
assigning labels of a modality and an anatomical focus to the reports in both the training set and the testing set, wherein assigning labels of the modality and the anatomical focus to the reports including assigning a binary classification distinguishing chest x-ray reports from non-chest x-ray reports and, for each report labeled as a non-chest x-ray report, assigning a multiclass classification defining a modality associated with the report;
extracting a sparse representation matrix for each of the training set and the testing set based on features in the training set;
learning, with one or more electronic processors, a correlation between the features of the training set and the corresponding labels using a machine learning classifier, thereby building a classification model;
testing the classification model on the reports in the testing set for accuracy using the sparse representation matrix of the testing set; and
predicting, with the classification model, labels of an anatomical focus and a modality for remaining reports in the corpus not included in the subset.