CPC H04N 21/8106 (2013.01) [G06V 20/42 (2022.01); G06V 20/46 (2022.01); H04N 21/2187 (2013.01); H04N 21/84 (2013.01)] | 20 Claims |
1. A method, comprising:
obtaining, during a live sporting event, video content of the live sporting event;
identifying a plurality of visual elements within a sequence of video frames of the video stream using one or more machine learning models;
determining a set of semantic representations based on the plurality of visual elements, wherein the set of semantic representations describes a player throwing a ball;
generating an audio description based on the semantic representations;
encoding the video content in parallel with generating the audio description;
encoding audio content of the live sporting event;
packaging the encoded video content, encoded audio content, and audio descriptions;
providing a manifest having references to the encoded video content, encoded audio content, and audio descriptions to a client device;
receiving a request for the video content and the audio description from the client device; and
providing the video content and the audio description to the client device.
|