| CPC G10L 15/063 (2013.01) [G09B 21/00 (2013.01); G10L 15/01 (2013.01); G10L 15/1822 (2013.01); G10L 15/22 (2013.01); G10L 15/30 (2013.01); G10L 25/18 (2013.01); G10L 25/84 (2013.01); G10L 2025/783 (2013.01)] | 12 Claims |

|
1. A method for creating accessibility of any website or application for people with sight, hearing or speech disabilities, the method comprising:
receiving, by a server, input of the website or the application to be accessed and an indicator as to specific disabilities a user of a device sending the input;
parsing, by the server, html elements from the input;
scoring, by the server, the website or the application for its accessibility based on the specific disabilities of the user;
for a score that is below a threshold, determining, by the server, an alternative form for the input of the website or the application based on the specific disabilities of the user and a corresponding machine learning model of a plurality of machine learning models;
adding, by the server, functionality to the input of the website or the application that enables the alternative form for the input on the website or the application;
outputting, by the server, to the device of the user the input of the website or the application with the added functionality; and
determining, by the server, a revised score for the website or the application for its accessibility based on the specific disabilities of the user and the alternative form for the input;
and if the revised score is below the threshold, continue to train the corresponding machine learning model,
wherein determining the alternative form for the input further comprises converting audio to text, video to text, sign language to text, image to text, or any combination thereof; and
wherein converting audio to text comprises:
i) collecting, by the processor, a predetermined duration of audio data having noise below a threshold and a transcript of the audio data;
ii) processing, by the processor, the audio data such that it is in a labeled format;
iii) training, by the processor, one model of the plurality of models with the audio data, wherein the one model is a baseline model;
iv) evaluating, by the processor, the trained one model using a word error rate metric;
v) if the word error rate metric is below a threshold, obtaining a new predetermine duration of audio data having noise below a threshold and a transcript of the audio data and proceeding back to step ii); vi) if the word error rate metric is at or above the threshold, set the model as the model to use for audio to text; and
vii) for any input that is audio, translate the input from audio to text using the model.
|