| CPC G09B 5/065 (2013.01) | 16 Claims |

|
1. A method comprising:
capturing, via a camera of a device, image data of a current state of a task;
storing the captured image data on a memory of the device;
detecting a user utterance;
determining a target state of the task based on the user utterance, wherein determining the target state of the task comprises:
detecting that the user utterance matches metadata associated with a particular state of the task; and
determining that the particular state of the task is the target state of the task;
determining, using a trained neural network, based on the image data, whether the current state of the task matches the target state of the task that was determined based on the user utterance; and
in response to determining that the current state of the task does not match a target state of the task, causing to be output a recommendation to bring the current state of the task to the target state of the task.
|