CPC G06Q 20/40 (2013.01) [G06F 18/24 (2023.01); G06F 40/284 (2020.01); G06N 3/02 (2013.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06Q 20/045 (2013.01); G06Q 20/389 (2013.01); G06Q 20/4016 (2013.01); G06Q 40/12 (2013.12); G06T 7/0002 (2013.01); G06T 7/74 (2017.01); G06V 30/224 (2022.01); G06V 30/413 (2022.01); G06V 30/414 (2022.01); G06V 30/418 (2022.01); G06F 16/24564 (2019.01); G06T 2207/20061 (2013.01); G06T 2207/30176 (2013.01)] | 20 Claims |
1. A computer-implemented method for auditing financial documents for potential fraud using pixel intensity testing, comprising:
identifying a first corpus of valid reference images, wherein each of the valid reference images comprises an authentic image of an authentic financial document, wherein the authentic image was captured by a camera;
identifying a second corpus of fraudulent reference images, wherein each of the fraudulent reference images comprises a fraudulent image of an inauthentic financial document, wherein the fraudulent image comprises a programmatically generated document;
analyzing first pixel values of first pixels in the valid reference images in the first corpus to determine that at least one valid pixel-based pattern is included in pixels of at least a first threshold percentage of the valid reference images, wherein the valid pixel-based pattern comprises a first white space slope metric, wherein a white space slope metric for any given image measures how values of pixel intensities corresponding to white space portions of the given image transition from top to bottom of the given image, a white space slope metric being generated by:
dividing the given image into a predefined number of horizontal strips from top to bottom, each strip comprising a group made up of a predefined number of rows of pixels;
determining, for each row of pixels in each group, a sum of pixel values for that row;
determining, as a list of maximum pixel intensity values for the given image, a maximum pixel sum for each group, wherein the maximum pixel intensity sum for a given group comprises a sum of pixel values for a row that has a maximum sum of pixel intensities among rows in the given group;
generating a regression line by plotting, on a graph of groups versus maximum pixel sums for each group, the list of maximum pixel intensity values for the given image; and
determining, as the white space slope metric for the given image, a slope of the regression line, wherein the slope of the regression line indicates a variance in the whitespace slope metric, with less slope indicating a greater likelihood that the given image relates to an inauthentic financial document;
analyzing second pixel values of second pixels in the fraudulent reference images in the second corpus to determine that at least one fraudulent pixel-based pattern is included in pixels of at least a second threshold percentage of the fraudulent reference images, wherein the fraudulent pixel-based pattern comprises a second white space slope metric;
receiving a request to classify a first image;
analyzing third pixel values of third pixels included in the first image to determine at least one pixel-based pattern in the third pixels, wherein the pixel-based pattern comprises a third white space slope metric;
determining whether the third white space slope metric matches either a first white space slope metric or a second white space slope metric;
in response to determining that the third white space slope metric matches the first white space slope metric, increasing a first likelihood of classifying the first image as a valid image;
in response to determining that the third white space slope metric matches the second white space slope metric, increasing a second likelihood of classifying the first image as a fraudulent image; and
classifying the first image in response to the request as either a valid image or a fraudulent image based on the first likelihood and the second likelihood.
|