US 12,190,292 B1
Systems and methods to train and/or use a machine learning model to generate correspondences between portions of recorded audio content and work unit records of a collaboration environment
Steve B Morin, San Francisco, CA (US)
Assigned to Asana, Inc., San Francisco, CA (US)
Filed by Asana, Inc., San Francisco, CA (US)
Filed on Feb. 17, 2022, as Appl. No. 17/674,534.
Int. Cl. G06Q 10/101 (2023.01); H04N 7/15 (2006.01)
CPC G06Q 10/101 (2013.01) [H04N 7/155 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A system configured to train a machine learning model to generate correspondences between portions of recorded audio content and work unit records of a collaboration environment, the system comprising:
non-transitory electronic storage storing environment state information maintaining a collaboration environment, the environment state information including work unit records, the work unit records including work information characterizing units of work, and resource information including digital assets defining temporal content of recorded audio content for which correspondences have been established, the work unit records being created within the collaboration environment and assigned within the collaboration environment to users who are expected to accomplish one or more actions to complete the units of work, wherein one of the work unit records corresponds to one of the units of work, such that the work unit records comprise:
a first work unit record including first work information characterizing a first unit of work, and first resource information including a first digital asset defining an instance of first temporal content of first recorded audio content for which a first correspondence has been established; and
a second work unit record including second work information characterizing a second unit of work, and second resource information including a second digital asset defining an instance of second temporal content of the first recorded audio content for which a second correspondence has been established; and
one or more physical processors configured by machine-readable instructions to:
manage the environment state information to facilitate interaction by the users with the collaboration environment;
obtain correspondence information conveying user-provided correspondences between the temporal content of the recorded audio content and one or more of the work unit records, the recorded audio content including utterances by one or more of the users, the temporal content corresponding to points in time and/or periods of time the users have identified within the recorded audio content, such that first correspondence information conveys the first correspondence between the first temporal content of the first recorded audio content and the first work unit record, and second correspondence information conveys the second correspondence between the second temporal content of the first recorded audio content and the second work unit record;
compile the correspondence information and the work information of the one or more of the work unit records into input/output pairs, the input/output pairs including training input information and training output information, the training input information for an individual input/output pair including the correspondence information for an individual one of the recorded audio content, the training output information for the individual input/output pair including the work information for an individual one of the work unit records, such that the first correspondence information and the first work information for the first work unit record are compiled into a first input/output pair, and the second correspondence information and the second work information for the second work unit record are compiled into a second input/output pair;
train a machine learning model based on the input/output pairs to generate a trained machine learning model, the trained machine learning model being configured to generate the correspondences between the temporal content of the recorded audio content and the work unit records, such that the machine learning model is trained using the first input/output pair and the second input/output pair to generate the trained machine learning model;
store the trained machine learning model;
obtain user input information conveying user input into instances of a graphical user interface including instances of a playback window through which the users play back the recorded audio content and effectuate adjustments to the points in time and/or the periods of time within the recorded audio content that identify the temporal content, the adjustments being effectuated through interaction with instances of a time slider element displayed in the instances of the playback window; and
refine the trained machine learning model based on the adjustments to the points in time and/or the periods of time within the recorded audio content that identify the temporal content, such that the trained machine learning model is refined in response to the first temporal content of the first recorded audio content being adjusted to first adjusted temporal content.