US 12,437,570 B2
	Exploration and production document content and metadata scanner
Rishabh Gupta, Pune (IN); Swapnil Patel, Pune (IN); and Udit Sinha, Brisbane (AU)
Assigned to Schlumberger Technology Corporation, Sugar Land, TX (US)
Appl. No. 18/260,526
Filed by Schlumberger Technology Corporation, Sugar Land, TX (US)
PCT Filed Jan. 7, 2022, PCT No. PCT/US2022/070091 § 371(c)(1), (2) Date Jul. 6, 2023, PCT Pub. No. WO2022/150838, PCT Pub. Date Jul. 14, 2022.
Claims priority of application No. 202121000983 (IN), filed on Jan. 8, 2021.
Prior Publication US 2024/0304016 A1, Sep. 12, 2024
Int. Cl. G06V 30/413 (2022.01); G06F 16/353 (2025.01); G06V 30/19 (2022.01); G06V 30/412 (2022.01); G06V 30/414 (2022.01); G06V 30/42 (2022.01)

CPC G06V 30/413 (2022.01) [G06F 16/353 (2019.01); G06V 30/19147 (2022.01); G06V 30/412 (2022.01); G06V 30/414 (2022.01); G06V 30/42 (2022.01); G06V 2201/10 (2022.01)]

17 Claims

1. A method comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms; calculating TF-IDF of the plurality of terms to generate an input vector; executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document; extracting table information from a table in the unstructured oilfield document; and storing, with the file in storage, the document content classification and the table information.