US 12,035,075 B1
	Embodied interface for face-to-face communication between humans and artificial intelligence agents
Ilia Sedoshkin, Dubai (AE); Yury Rumyantsev, Moscow (RU); Anton Stepanov, Moscow (RU); and Egor Aleksandrov, Moscow (RU)
Assigned to Dragon Tree Partners LLC, Dover, DE (US)
Filed by Dragon Tree Partners LLC, Dover, DE (US)
Filed on Dec. 19, 2023, as Appl. No. 18/545,238.
Claims priority of provisional application 63/433,500, filed on Dec. 19, 2022.
Int. Cl. H04N 7/15 (2006.01); G06F 3/14 (2006.01); G06T 13/40 (2011.01); G06V 10/40 (2022.01); G06V 40/16 (2022.01); G10L 15/02 (2006.01); G10L 17/00 (2013.01); G10L 25/63 (2013.01); H04N 7/14 (2006.01)

CPC H04N 7/157 (2013.01) [G06F 3/1446 (2013.01); G06T 13/40 (2013.01); G06V 10/40 (2022.01); G06V 40/172 (2022.01); G06V 40/174 (2022.01); G10L 15/02 (2013.01); G10L 17/00 (2013.01); G10L 25/63 (2013.01); H04N 7/141 (2013.01)]

20 Claims

1. A method comprising:

generating, by at least one processor and based on an avatar model, a data stream including at least one image of a face associated with the avatar model, audio data associated with a speech of the avatar model, and a rotation instruction;

transmitting, by the at least one processor, the data stream to a three-dimensional video call system, the three-dimensional video call system including:

a stand;

an axle extended from the stand;

a controller;

at least one acoustic sensor coupled with the controller and configured to sense an ambient acoustic signal in an ambient environment;

a video camera coupled with the controller and configured to capture an ambient video signal in the ambient environment;

at least one actuator coupled with the controller and configured to rotate the axle; and

a plurality of display devices attached to the axle and communicatively coupled with the controller, wherein the controller is configured to:

cause a display device of the plurality of display devices to:

display a portion of the at least one image of the face, thereby causing the plurality of display devices to display a three-dimensional image of the face; and

play back the audio data;

cause the at least one actuator to rotate the axle according to the rotation instruction; and

analyze the ambient video signal and the ambient acoustic signal to obtain at least one environmental feature; and

transmit the at least one environmental feature to the at least one processor.