US 12,062,124 B2
	Systems and methods for AI driven generation of content attuned to a user
Robert E. Bosnak, Santa Barbara, CA (US); David E. Bosnak, Sherman Oaks, CA (US); and Albert Rizzo, Los Angeles, CA (US)
Assigned to Attune Media Labs, PBC, Sherman Oaks (CA)
Filed by Attune Media Labs, PBC, Sherman Oaks, CA (US)
Filed on Sep. 15, 2023, as Appl. No. 18/369,061.
Application 18/369,061 is a continuation of application No. 18/121,278, filed on Mar. 14, 2023, granted, now 11,798,217.
Application 18/121,278 is a continuation of application No. 17/937,484, filed on Oct. 3, 2022, granted, now 11,615,572, issued on Mar. 28, 2023.
Application 17/937,484 is a continuation of application No. 17/747,080, filed on May 18, 2022, granted, now 11,461,952, issued on Oct. 4, 2022.
Claims priority of provisional application 63/190,028, filed on May 18, 2021.
Prior Publication US 2024/0005583 A1, Jan. 4, 2024
Int. Cl. G06T 13/40 (2011.01); G06T 13/20 (2011.01); G06V 40/16 (2022.01); G06V 40/18 (2022.01); G10L 15/18 (2013.01); G10L 15/22 (2006.01); G10L 25/63 (2013.01)

CPC G06T 13/40 (2013.01) [G06T 13/205 (2013.01); G06V 40/176 (2022.01); G06V 40/193 (2022.01); G10L 15/1815 (2013.01); G10L 15/22 (2013.01); G10L 25/63 (2013.01)]

20 Claims

1. A method comprising:

receiving, by a processor via an audio-visual input device, audio-visual input data of user communications during a period of time;

utilizing, by the processor, at least one speech recognition model to recognize speech data of the audio-visual input data;

inputting, by the processor, the speech data into at least one natural language understanding model to produce speech recognition data indicative of meaning, intent and sentiment;

determining, by the processor, at least one current emotional complex signature associated with user reactions during a current emotional state of the user during the period of time based at least in part on:

the speech recognition data and at least one of:

at least one time-varying speech emotion metric or

at least one time-varying facial emotion metric;

wherein the at least one time-varying speech emotion metric is determined by:

determining, by the processor, the at least one time-varying speech emotion metric throughout the period of time based at least in part on the speech recognition data; and

wherein the at least one time-varying facial emotion metric is determined by:

utilizing, by the processor, at least one facial emotion recognition model to produce facial action units representative of recognized facial features represented in the audio-visual input data;

determining, by the processor, the at least one time-varying facial emotion metric throughout the period of time based at least in part on the speech recognition data, the facial action units and a facial action coding system;

logging, by the processor, at least one current emotional complex signature in a memory;

tagging, by the processor, a high amplitude-high confidence interaction to indicate at least one changed emotional state where a magnitude of the at least one current emotional complex signature exceeds a predetermined threshold; and

presenting, via at least one output device, by the processor, a virtual representation of a responder to the user in response to the at least one changed emotional state.