US 12,141,107 B2
Techniques for discovering and updating semantic meaning of data fields
John Joyce, Boston, MA (US); David Huang, Sugar Land, TX (US); Andrew Chang, Westford, MA (US); and Niel Morrison, Western Cape (ZA)
Assigned to Ab Initio Technology LLC, Lexington, MA (US)
Filed by Ab Initio Technology LLC, Lexington, MA (US)
Filed on Sep. 19, 2023, as Appl. No. 18/470,405.
Claims priority of provisional application 63/408,400, filed on Sep. 20, 2022.
Prior Publication US 2024/0095219 A1, Mar. 21, 2024
Int. Cl. G06F 16/00 (2019.01); G06F 16/21 (2019.01); G06F 40/30 (2020.01)
CPC G06F 16/21 (2019.01) [G06F 40/30 (2020.01)] 20 Claims
OG exemplary drawing
 
1. A method implemented by a data processing system for discovering semantic meaning of data in fields included in one or more data sets, the method comprising:
using the data processing system to perform:
identifying a first field having a previously-assigned label that indicates a semantic meaning of the first field, the previously-assigned label having a corresponding previously-determined label score;
identifying a set of one or more candidate labels, for potential assignment to the first field instead of the previously-assigned label, and a corresponding set of candidate label scores, the set of candidate labels including a first candidate label corresponding to a first candidate label score in the set of candidate label scores; and
evaluating, using the previously-determined label score and the first candidate label score, whether to assign the first candidate label to the first field, the evaluating comprising:
when the first candidate label score is at least a first threshold amount greater than the previously-determined label score, presenting the first candidate label to a user by generating an interface through which the user can provide input indicating whether to assign the first candidate label to the first field instead of the previously-determined label; and
when the first candidate label score is not at least the threshold amount greater than the previously determined label score, bypassing presentation of the first candidate label to the user.