US 12,437,028 B2
Systems, software and methods for generating training datasets for machine learning applications
Daniel Jaffe Butler, La Jolla, CA (US); Eiman Azim, La Jolla, CA (US); and Alexander Keim, La Jolla, CA (US)
Assigned to THE SALK INSTITUE FOR BIOLOGICAL STUDIES, La Jolla, CA (US)
Filed by The Salk Institute for Biological Studies, La Jolla, CA (US)
Filed on Oct. 15, 2021, as Appl. No. 17/503,116.
Claims priority of provisional application 63/092,841, filed on Oct. 16, 2020.
Prior Publication US 2022/0121878 A1, Apr. 21, 2022
Int. Cl. G06K 9/00 (2022.01); G06F 18/214 (2023.01); G06F 18/24 (2023.01); G06V 10/20 (2022.01)
CPC G06F 18/214 (2023.01) [G06F 18/24 (2023.01); G06V 10/255 (2022.01)] 15 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
obtaining a first plurality of images of at least one subject using a first imaging modality, the first plurality of images comprising at least one parameter, the subject having at least one imaging label associated therewith, wherein the at least one imaging label comprises at least one fluorescent imaging label;
computing a centroid for each of the first plurality of images by thresholding the at least one fluorescent imaging label and computing a binary mask of each of the first plurality of images;
obtaining a second plurality of images of the at least one subject using a second imaging modality different from the first imaging modality;
generating a plurality of labeled images using the at least one parameter and the computed centroids by mapping the computed centroids onto the second plurality of images;
training a supervised machine learning (ML) operation using the plurality of labeled images; and
applying the supervised ML operation to a third plurality of images to identify a region of interest within the third plurality of images.