US 11,886,457 B2
Automatic transformation of data by patterns
Yeye He, Redmond, WA (US); Surajit Chaudhuri, Kirkland, WA (US); and Zhongjun Jin, Ann Arbor, MI (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on May 29, 2020, as Appl. No. 16/887,564.
Prior Publication US 2021/0374134 A1, Dec. 2, 2021
Int. Cl. G06F 16/25 (2019.01); G06F 16/2453 (2019.01); G06F 16/21 (2019.01); G06F 16/248 (2019.01); G06F 18/21 (2023.01)
CPC G06F 16/254 (2019.01) [G06F 16/213 (2019.01); G06F 16/248 (2019.01); G06F 16/24539 (2019.01); G06F 18/2178 (2023.01)] 21 Claims
OG exemplary drawing
 
1. A computing system comprising:
at least one processor; and
at least one hardware storage device that stores instructions that are executable by the at least one processor to cause the computing system to perform the following in response to receiving a source dataset and a target dataset:
identify a subset of the source dataset and a subset of the target dataset as related data;
access a plurality of Transform-by-Pattern (TBP) programs, wherein each TBP program in the plurality of TBP programs comprises a combination of (i) a source pattern, (ii) a target pattern, and (iii) a transformation program, wherein the transformation program is configured to transform data that fits into the target pattern into data that fits into the source pattern;
identify one or more of the TBP programs as being applicable to the related data, wherein:
for each of the one or more TBP programs, (i) at least one data unit of the subset of the source dataset fits into the source pattern of the TBP program and (ii) at least one data unit of the subset of the target dataset fits into the target pattern of the TBP program,
the one or more TBP programs are identified based on a determination that corresponding coverage scores for the one or more TBP programs are greater than a predetermined threshold, and the coverage scores indicate applicability rates for the one or more TBP programs, and
a first TBP program included among the one or more TBP programs is learned or generated based on telemetry data obtained in response to leverage of a query log, the telemetry data being associated with a particular type of task and being usable to learn or generate the first TBP program; and
apply at least one of the one or more applicable TBP programs to the target dataset, wherein applying the at least one of the one or more applicable TBP programs to the target dataset includes:
selecting one applicable TBP program; and
using a transformation program of the selected one applicable TBP program to automatically transform the subset of the target dataset to transformed data, the transformed data comprising an integrated dataset comprising the source dataset and the transformed target dataset.