CPC G06V 30/416 (2022.01) [G06F 17/16 (2013.01); G06F 18/22 (2023.01); G06F 40/30 (2020.01); G06V 30/422 (2022.01)] | 9 Claims |
1. A processor implemented method for sustainability measurement, comprising:
collecting a plurality of documents in one or more formats from at least one data source as input, via one or more hardware processors, wherein the plurality of documents contain organizational data;
extracting the organizational data using at least one data-crawling mechanism, via the one or more hardware processors; and
processing the extracted organizational data, via the one or more hardware processors, comprising:
generating a first matrix, wherein each row of a plurality of rows in the first matrix is a vector representation of each of a plurality of words in at least one sentence in the extracted organizational data;
replacing the vector representation of each of the plurality of rows with a corresponding zero vector;
generating a second matrix, wherein
a plurality of rows of the second matrix correspond to embeddings of a plurality of words in at least one indicator, and
at least one other row in the second matrix correspond to a tuple-wise average of the embeddings;
generating a similarity matrix, wherein each entry in the similarity matrix is computed as a function of cosine similarity between a sentence word and an indicator word;
calculating a similarity score based on data in the similarity matrix, wherein the similarity score calculated for each sentence and an indicator represents extent of match of the sentence with the indicator;
extracting a plurality of features from each of a plurality of sentences of the organizational data;
identifying at least one indicator matching each of the plurality of sentences, based on the calculated similarity score and the extracted plurality of features; and
measuring sustainability of the organization based on a determined indicator category of majority of the plurality of documents collected from the organization, wherein the indicator category of the majority of the plurality of documents is determined based on the identified at least one indicator matching each of the plurality of sentences in each of the plurality of documents.
|