US 12,423,524 B2
	Recognition method and electronic device
Jia Wang, New Taipei (TW); Jing-Cheng Ke, New Taipei (TW); Wen-Huang Cheng, New Taipei (TW); Hong-Han Shuai, New Taipei (TW); and Yung-Hui Li, New Taipei (TW)
Assigned to HON HAI PRECISION INDUSTRY CO., LTD., New Taipei (TW); and Foxconn Technology Group Co., Ltd., Guangdong Province (CN)
Filed by HON HAI PRECISION INDUSTRY CO., LTD., New Taipei (TW); and Foxconn Technology Group Co., Ltd., Guangdong Province (CN)
Filed on Jul. 10, 2023, as Appl. No. 18/349,183.
Claims priority of provisional application 63/367,915, filed on Jul. 8, 2022.
Prior Publication US 2024/0013001 A1, Jan. 11, 2024
Int. Cl. G06F 40/295 (2020.01); G06V 30/146 (2022.01); G06V 30/19 (2022.01)

CPC G06F 40/295 (2020.01) [G06V 30/147 (2022.01); G06V 30/19187 (2022.01)]

12 Claims

1. A recognition method, comprising:

analyzing a text to generate an entity feature, a relation feature and an overall feature by a text recognition network;

analyzing an input image to generate a plurality of candidate regions by an object detection network;

generating a plurality of node features, a plurality of aggregated edge features and a plurality of compound features according to the entity feature, the relation feature, the candidate regions and the overall feature by an enhanced cross-modal graph attention network;

matching the entity feature and the relation feature to the node features and the aggregated edge features to generate a plurality of first scores;

matching the overall feature to the compound features to generate a plurality of second scores; and

generating a plurality of final scores corresponding to the candidate regions according to the first scores and the second scores,

wherein the recognition method further comprises:

generating an initial graph attention network according to the candidate regions by the enhanced cross-modal graph attention network;

classifying a plurality of nodes corresponding to the candidate regions into a plurality of strong nodes and a plurality of weak nodes according to areas of the candidate regions; and

updating the initial graph attention network according to the strong nodes and the weak nodes to generate an initial updated graph attention network,

wherein the recognition method further comprises:

updating the initial updated graph attention network according to the entity feature and the relation feature to generate a first graph attention network,

wherein the recognition method further comprises:

performing a multi-step reasoning operation on the first graph attention network according to the overall feature to generate a last aggregated graph attention network; and

generating the compound features by the last aggregated graph attention network,

wherein the step of performing the multi-step reasoning operation on the first graph attention network comprising a plurality of reasoning steps, wherein the recognition method in each of the reasoning steps comprises:

receiving a previous aggregated graph attention network;

removing a portion of the nodes included in the previous aggregated graph attention network with lower scores to generate a current graph attention network; and

performing an aggregation process on the current graph attention network according to the overall feature to generate a current aggregated graph attention network.