US 12,147,464 B2
	Extracting user-defined attributes from documents
Prashanth Pillai, Pune (IN); and Purnaprajna Raghavendra Mangsuli, Pune (IN)
Assigned to Schlumberger Technology Corporation, Sugar Land, TX (US)
Filed by Schlumberger Technology Corporation, Sugar Land, TX (US)
Filed on Nov. 27, 2023, as Appl. No. 18/519,322.
Application 18/519,322 is a continuation of application No. 17/814,383, filed on Jul. 22, 2022, granted, now 11,829,399.
Prior Publication US 2024/0086440 A1, Mar. 14, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 7/00 (2006.01); G01V 11/00 (2006.01); G06F 16/33 (2019.01); G06F 40/279 (2020.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01)

CPC G06F 16/3347 (2019.01) [G01V 11/002 (2013.01); G06F 40/279 (2020.01); G06V 30/412 (2022.01); G06V 30/413 (2022.01); G06V 2201/10 (2022.01)]

14 Claims

1. A method for extracting text associated with user-defined attributes from a plurality of documents, the method comprising:

identifying relevant documents related to a specific entity from storage, wherein the specific entity is either a specific oil well or is associated with one from a group consisting of a wellbore, an oilfield, and a prospect;

extracting text and spatial coordinates of the text;

identifying significant document entities and associated spatial locations of the significant document entities through page layout analysis;

ranking pages of the relevant documents based on the extracted text and the spatial coordinates using term frequency-inverse document frequency (TFIDF) or Okapi Best Match 25 (Okapi BM25);

extracting user-defined attributes from the pages of the relevant documents using a deep learning language model;

aggregating first attribute values associated with the user-defined attributes from one of the relevant documents into a single record;

aggregating second attribute values associated with the user-defined attributes across the relevant documents;

aggregating an attribute value across multiple sources based on at least one of a majority vote from among the multiple sources, a confidence probability of the attribute value from among the multiple sources, source metadata, and source priority, wherein the majority vote involves determining which attribute value was extracted a majority of the time; and

writing aggregated records to a database.