US 12,282,829 B2
Techniques for data type detection with learned metadata
Anh Truong, Champaign, IL (US); Austin Grant Walters, Savoy, IL (US); and Jeremy Edward Goodsitt, Champaign, IL (US)
Assigned to Capital One Services, LLC, McLean, VA (US)
Filed by Capital One Services, LLC, McLean, VA (US)
Filed on Aug. 25, 2021, as Appl. No. 17/412,034.
Prior Publication US 2023/0064886 A1, Mar. 2, 2023
Int. Cl. G06F 16/00 (2019.01); G06N 20/00 (2019.01)
CPC G06N 20/00 (2019.01) 20 Claims
OG exemplary drawing
 
1. An apparatus, the apparatus comprising:
a processor; and
memory comprising instructions that when executed by the processor cause the processor to:
identify a set of data objects, the set of data objects comprising an array of values, wherein each data object in the set of data objects comprises a column value, a row value, and a data value;
determine a first group of data objects in the set of data objects, the first group of data objects corresponding to a first column in the array of values;
determine a second group of data objects in the set of data objects, the second group of data objects corresponding to a second column in the array of values;
concatenate data values from the first group of data objects with the data values from the second group of data objects in a row-wise manner to produce a concatenated group of data values;
determine at least one of a set of a plurality of embedding space parameters based on the concatenated group of data values, wherein the set of embedding space parameters define an embedding space comprising a plurality of dimensions; and
generate a set of object vectors, the set of object vectors comprising an object vector for each data object in the set of data objects, each object vector in the set of object vectors to include a set of dimension values and each dimension value in the set of dimension values to correspond to one of the plurality of dimensions in the embedding space, wherein a respective object vector for a respective data object is generated based on a respective column value of the respective data object, a respective row value of the respective data object, and the set of embedding space parameters.