US 12,339,927 B2
Systems and techniques to monitor text data quality
Robin Astrid Epp Neufeld, Toronto (CA)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Jun. 16, 2023, as Appl. No. 18/210,935.
Application 17/962,719 is a division of application No. 16/406,848, filed on May 8, 2019, granted, now 11,048,984, issued on Jun. 29, 2021.
Application 18/210,935 is a continuation of application No. 17/962,719, filed on Oct. 10, 2022, granted, now 11,748,448.
Application 18/210,935 is a continuation of application No. 16/601,660, filed on Oct. 15, 2019, granted, now 11,475,252, issued on Oct. 18, 2022.
Prior Publication US 2023/0334119 A1, Oct. 19, 2023
Int. Cl. G06F 11/30 (2006.01); G06F 18/214 (2023.01); G06F 18/2411 (2023.01); G06N 3/08 (2023.01); G06N 20/00 (2019.01); G06V 30/148 (2022.01)
CPC G06F 18/214 (2023.01) [G06F 18/2411 (2023.01); G06N 3/08 (2013.01); G06N 20/00 (2019.01); G06V 30/153 (2022.01)] 20 Claims
OG exemplary drawing
 
1. A non-transitory computer readable medium embodying programming code that when executed by a processor causes the processor to:
receive a dataset with a plurality of variable length character strings;
compute a plurality of features of alphanumeric characters in the plurality of variable length character strings, wherein combinations of features in the plurality of variable length character strings are captured attributes;
populate a data vector with the captured attributes for each respective variable length character string of the plurality of variable length character strings;
assign categories to the data vectors based on the captured attributes in the data vectors;
determine whether the dataset satisfies a quality metric by determination of a number or a percentage of data vectors assigned to one or more of the categories; and
transmit, to a client device, an alarm, a report, and link to the dataset, wherein the report indicates a failure of the dataset to satisfy the quality metric.