US 12,462,366 B2
Machine-learning models for image processing
Ashutosh K. Sureka, Irving, TX (US); Venkata Sesha Kiran Kumar Adimatyam, Irving, TX (US); Miriam Silver, Tel Aviv (IL); and Daniel Funken, Irving, TX (US)
Assigned to CITIBANK, N.A., New York, NY (US)
Filed by Citibank, N.A., New York, NY (US)
Filed on May 23, 2025, as Appl. No. 19/217,967.
Application 19/217,967 is a continuation of application No. 18/629,228, filed on Apr. 8, 2024, granted, now 12,315,126.
Prior Publication US 2025/0315931 A1, Oct. 9, 2025
Int. Cl. G06K 19/10 (2006.01); G06T 7/00 (2017.01); G06V 10/74 (2022.01); G06V 20/70 (2022.01)
CPC G06T 7/0002 (2013.01) [G06V 10/761 (2022.01); G06V 20/70 (2022.01); G06T 2207/30168 (2013.01); G06V 2201/07 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A method for capturing document imagery from video feed data, the method comprising:
obtaining, by one or more processors, a video feed for a document generated at a camera, the video feed including a plurality of frames containing image data;
extracting, by the one or more processors, a first feature vector representing a plurality of attribute features of the document in the image data of a first frame and a plurality of content features of the document in the image data of the first frame, and a second feature vector representing the plurality of attribute features of the document in the image data of a second frame and the plurality of content features of the document in the image data of the second frame;
generating, by the one or more processors, a first similarity score for the first feature vector of the first frame and the second feature vector for the second frame based upon a first distance between the attribute features of the first feature vector and the attribute features of the second feature vector;
generating, by the one or more processors, a second similarity score for the first feature vector of the first frame and the second feature vector for the second frame based upon a second distance between the content features of the first feature vector and the content features of the second feature vector;
determining, by the one or more processors, a liveness score to validate the document in the video feed, the liveness score is based upon a plurality of similarity scores generated using each feature vector of the plurality of frames; and
generating, by the one or more processors, an output image representing the document in the video feed in response to determining that the liveness score satisfies a liveness threshold.