| CPC G06F 40/295 (2020.01) [G06V 30/147 (2022.01); G06V 30/19187 (2022.01)] | 12 Claims | 

| 
               1. A recognition method, comprising: 
            analyzing a text to generate an entity feature, a relation feature and an overall feature by a text recognition network; 
                analyzing an input image to generate a plurality of candidate regions by an object detection network; 
                generating a plurality of node features, a plurality of aggregated edge features and a plurality of compound features according to the entity feature, the relation feature, the candidate regions and the overall feature by an enhanced cross-modal graph attention network; 
                matching the entity feature and the relation feature to the node features and the aggregated edge features to generate a plurality of first scores; 
                matching the overall feature to the compound features to generate a plurality of second scores; and 
                generating a plurality of final scores corresponding to the candidate regions according to the first scores and the second scores, 
                wherein the recognition method further comprises: 
                generating an initial graph attention network according to the candidate regions by the enhanced cross-modal graph attention network; 
                  classifying a plurality of nodes corresponding to the candidate regions into a plurality of strong nodes and a plurality of weak nodes according to areas of the candidate regions; and 
                  updating the initial graph attention network according to the strong nodes and the weak nodes to generate an initial updated graph attention network, 
                wherein the recognition method further comprises: 
                updating the initial updated graph attention network according to the entity feature and the relation feature to generate a first graph attention network, 
                wherein the recognition method further comprises: 
                performing a multi-step reasoning operation on the first graph attention network according to the overall feature to generate a last aggregated graph attention network; and 
                  generating the compound features by the last aggregated graph attention network, 
                wherein the step of performing the multi-step reasoning operation on the first graph attention network comprising a plurality of reasoning steps, wherein the recognition method in each of the reasoning steps comprises: 
              receiving a previous aggregated graph attention network; 
                  removing a portion of the nodes included in the previous aggregated graph attention network with lower scores to generate a current graph attention network; and 
                  performing an aggregation process on the current graph attention network according to the overall feature to generate a current aggregated graph attention network. 
                 |