CPC G06Q 40/12 (2013.12) [G06F 16/24542 (2019.01); G06F 16/24564 (2019.01); G06F 40/30 (2020.01); G06N 5/047 (2013.01); G06N 7/01 (2023.01); G06N 20/00 (2019.01); G06N 20/10 (2019.01)] | 9 Claims |
1. A method of detecting anomalies in expense reports, comprising:
applying a semantic analysis mechanism to an expense report submitted by an employee, wherein the semantic analysis mechanism includes instructions executed by a processor to:
build a set of data structures from expense report data,
provide the set of data structures from the expense report data to a probabilistic information engine that uses an expense ontology, wherein the expense ontology defines a set of categories of expenses and expense attributes, and
determine with the probabilistic information engine at least one transaction expense type in the expense report based on the expense topology to form a first enhancement of the expense report data;
supplementing the first enhancement of the expense report data with data obtained from a World Wide Web source to form a second enhancement of the expense report data; and
executing machine learning models on the second enhancement of the expense report data, wherein the machine learning models include:
a first trained machine learning model to detect employee interaction with a corrupt entity, the first trained machine learning model is trained with Politically Exposed Person (PEP) data, fictional person data and foreign location risk data,
a second trained machine learning model to detect employee activity indicative of bribery, the second trained machine learning model is trained with PEP data, fictional person data and foreign location risk data,
a third trained machine learning model to detect violation of company travel guidelines, the third trained machine learning model is trained with travel class policies and travel booking provider data,
a fourth trained machine learning model to detect an expense request inconsistent with historic expense data, the fourth trained machine learning model is trained using historical risk and behavioral data for employees, and
a fifth trained machine learning model to detect employee interaction with a vendor associated with fraudulent activities, the fifth trained machine learning model is trained using vendor profile data, fraudulent vendor data, vendor address data and fictional vendor data;
wherein one or more of the machine learning models utilize a Support Vector Machine (SVM) linear kernel model, wherein the SVM linear kernel model is trained using a specified number of training samples of employees in a company to be memory efficient.
|