CPC G06V 30/413 (2022.01) [G06F 16/9027 (2019.01); G06V 10/20 (2022.01)] | 7 Claims |
1. A method comprising using at least one hardware processor to:
until a determination to stop processing is made, for each of a plurality of image frames in a video stream,
receive the image frame,
generate a text-recognition result from the image frame, wherein the text-recognition result comprises a vector of class estimations for each of one or more characters,
combine the text-recognition result with an accumulated text-recognition result,
estimate a distance between the accumulated text-recognition result and a next accumulated text-recognition result based on an approximate model of the next accumulated text-recognition result, wherein the distance between the accumulated text-recognition result and the next accumulated text-recognition result is estimated as
wherein Δn is the estimated distance,
wherein n is a current number of image frames for which text-recognition results have been combined with the accumulated text-recognition result,
wherein δ is an external parameter,
wherein Sn is a number of vectors of class estimations in the accumulated text-recognition result,
wherein K is a number of classes represented in each vector of class estimations in the accumulated text-recognition result, and
wherein Δijk is a contribution to the estimated distance by a class estimation for a k-th class to a j-th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from an i-th image frame, and
determine whether or not to stop the processing based on the estimated distance; and,
after stopping the processing, output a character string based on the accumulated text-recognition result.
|