| CPC G06F 16/211 (2019.01) [G06F 16/221 (2019.01); G06F 16/2237 (2019.01); G06F 16/24573 (2019.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 3/044 (2023.01); G06N 3/08 (2013.01)] | 18 Claims |

|
1. A computer-implemented method for data or schema mapping, the computer-implemented method comprising:
obtaining, by a server computer, source schema metadata, wherein the source schema metadata is associated with fields of a source schema;
obtaining, by the server computer, target schema metadata with a target schema, wherein the target schema metadata is associated with fields of a target schema;
determining, by the server computer, for each field of the source schema and each field of the target schema, a representation for each field based, at least in part, on the source schema metadata or the target schema metadata associated with each field, and generating schema field representations for a dynamic object schema, wherein the dynamic object schema enable extending schemas of an existing object, and wherein the determining of the representation for each field of the source schema and each field of the target schema comprises:
training a machine learning model, wherein training the machine learning model comprises:
obtaining a first training dataset;
generating a first stage machine learning model based on the first training dataset, wherein the first stage machine learning model is trained to encode a sentence included in a dataset into a vector representation;
obtaining a second training dataset, wherein the second training dataset includes schema metadata from various schema objects;
generating a trained machine learning model based on the second training dataset and the first stage machine learning model, wherein the trained machine learning model is trained to encode sentences associated with metadata columns of a schema field into a single vector representation, wherein an input layer of the machine learning model utilizes a same encoder to encode a premise and a hypothesis; and
utilizing triplets of related data in an unsupervised fashion to learn sentence representations directly from the metadata columns;
generating, through a sequence neural network, sequence aware embeddings based on the representation for each field by combining the representations from the source schema metadata and the target schema metadata; and
providing, by the server computer, the representation for each field of the source schema and the representation for each field of the target schema for use in generating data mappings between the source schema and the target schema.
|