| CPC G06F 18/214 (2023.01) [G06F 18/2411 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06V 30/153 (2022.01)] | 20 Claims |

|
1. A non-transitory computer readable medium embodying programming code that when executed by a processor causes the processor to:
receive a dataset with a plurality of variable length character strings;
compute a plurality of features of alphanumeric characters in the plurality of variable length character strings, wherein combinations of features in the plurality of variable length character strings are captured attributes;
populate a data vector with the captured attributes for each respective variable length character string of the plurality of variable length character strings;
assign categories to the data vectors based on the captured attributes in the data vectors;
determine whether the dataset satisfies a quality metric by determination of a number or a percentage of data vectors assigned to one or more of the categories; and
transmit, to a client device, an alarm, a report, and link to the dataset, wherein the report indicates a failure of the dataset to satisfy the quality metric.
|