US 12,175,718 B2
Method for locating image region, model training method, and related apparatus
Lin Ma, Shenzhen (CN)
Assigned to TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED, Shenzhen (CN)
Filed by Tencent Technology (Shenzhen) Company Limited, Shenzhen (CN)
Filed on May 12, 2021, as Appl. No. 17/319,028.
Application 17/319,028 is a continuation of application No. PCT/CN2020/078532, filed on Mar. 10, 2020.
Claims priority of application No. 201910190207.2 (CN), filed on Mar. 13, 2019.
Prior Publication US 2021/0264227 A1, Aug. 26, 2021
Int. Cl. G06V 10/25 (2022.01); G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 3/04 (2023.01); G06V 10/44 (2022.01); G06V 10/82 (2022.01); G06V 10/84 (2022.01); G06V 30/262 (2022.01); G06V 30/418 (2022.01)
CPC G06V 10/25 (2022.01) [G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 3/04 (2013.01); G06V 10/454 (2022.01); G06V 10/82 (2022.01); G06V 10/84 (2022.01); G06V 30/274 (2022.01); G06V 30/418 (2022.01)] 14 Claims
OG exemplary drawing
 
1. A method for locating an image region, comprising:
determining one or more regions in an image, each of the regions corresponding to a respective candidate object in the image;
generating a set of semantic information for the one or more regions, each semantic information having a one-to-one correspondence with a corresponding candidate object in one of the one or more regions;
obtaining a strength of a connecting edge between two pieces of semantic information within the set of semantic information;
normalizing the strength of the connecting edge between the two pieces of semantic information within the set of semantic information to obtain a normalized strength;
determining a target connection matrix according to normalized strengths between pieces of semantic information within the set of semantic information;
determining a set of enhanced semantic information corresponding to the target connection matrix using a graph convolutional network (GCN), each enhanced semantic information having a one-to-one correspondence with a corresponding one of the set of semantic information, the GCN being configured to build an association relationship between various pieces of semantic information;
obtaining a matching degree between a text feature set corresponding to a to-be-located text and each of the respective enhanced semantic information using an image region locating network model, the image region locating network model being configured to determine a matching relationship between the image candidate region and the to-be-located text, each word in the to-be-located text corresponding to one word feature in the text feature set; and
determining a target image candidate region from the one or more regions according to the matching degree between the text feature set and each of the respective enhanced semantic information.