| CPC G10L 15/083 (2013.01) [G06F 3/167 (2013.01); G06V 20/64 (2022.01); G10L 15/063 (2013.01); G10L 15/1822 (2013.01); G10L 15/22 (2013.01); G10L 15/26 (2013.01); G10L 2015/088 (2013.01)] | 24 Claims |

|
1. A system to transition between different modalities, comprising:
a data processing system comprising one or more processors to:
receive, via a network, data packets comprising an input audio signal detected by a microphone of a computing device remote from the data processing system;
parse the input audio signal to identify a request;
select, based on the request, a digital component object having a visual output format, the digital component object associated with metadata;
determine, based on a type of the computing device, to convert the digital component object into an audio output format;
generate, responsive to the determination to convert the digital component object into the audio output format, text for the digital component object;
select, based on context of the digital component object, a digital voice to render the text;
construct a baseline audio track of the digital component object with the text rendered by the digital voice;
generate, based on the digital component object, non-spoken audio cues;
combine the non-spoken audio cues with the baseline audio form of the digital component object to generate an audio track of the digital component object; and
provide, responsive to the request from the computing device, the audio track of the digital component object to the computing device for output via a speaker of the computing device.
|