| CPC G06V 30/413 (2022.01) [G06F 16/353 (2019.01); G06V 30/19147 (2022.01); G06V 30/412 (2022.01); G06V 30/414 (2022.01); G06V 30/42 (2022.01); G06V 2201/10 (2022.01)] | 17 Claims |

|
1. A method comprising: obtaining, for a plurality of oilfield document content classes, a training set comprising a plurality of documents; calculating an inverse document frequency from the plurality of documents in the training set; calculating term frequency inverse document frequency (TF-IDF) of terms in the training data set to generate a plurality of TF-IDF vector results related to a plurality of document content classes; training the document content type classification model using the plurality of TF-IDF vector results; extracting, from a file comprising an unstructured oilfield document, a plurality of terms; calculating TF-IDF of the plurality of terms to generate an input vector; executing a document content classification model on the input vector to generate a document content classification of unstructured oilfield document; extracting table information from a table in the unstructured oilfield document; and storing, with the file in storage, the document content classification and the table information.
|