CPC G06V 10/25 (2022.01) [G06F 18/214 (2023.01); G06F 18/22 (2023.01); G06N 3/04 (2013.01); G06V 10/454 (2022.01); G06V 10/82 (2022.01); G06V 10/84 (2022.01); G06V 30/274 (2022.01); G06V 30/418 (2022.01)] | 14 Claims |
1. A method for locating an image region, comprising:
determining one or more regions in an image, each of the regions corresponding to a respective candidate object in the image;
generating a set of semantic information for the one or more regions, each semantic information having a one-to-one correspondence with a corresponding candidate object in one of the one or more regions;
obtaining a strength of a connecting edge between two pieces of semantic information within the set of semantic information;
normalizing the strength of the connecting edge between the two pieces of semantic information within the set of semantic information to obtain a normalized strength;
determining a target connection matrix according to normalized strengths between pieces of semantic information within the set of semantic information;
determining a set of enhanced semantic information corresponding to the target connection matrix using a graph convolutional network (GCN), each enhanced semantic information having a one-to-one correspondence with a corresponding one of the set of semantic information, the GCN being configured to build an association relationship between various pieces of semantic information;
obtaining a matching degree between a text feature set corresponding to a to-be-located text and each of the respective enhanced semantic information using an image region locating network model, the image region locating network model being configured to determine a matching relationship between the image candidate region and the to-be-located text, each word in the to-be-located text corresponding to one word feature in the text feature set; and
determining a target image candidate region from the one or more regions according to the matching degree between the text feature set and each of the respective enhanced semantic information.
|