| CPC G06V 20/68 (2022.01) [G06F 16/583 (2019.01); G06T 7/11 (2017.01); G06V 10/82 (2022.01); G06V 20/70 (2022.01); G06T 2207/30128 (2013.01)] | 10 Claims |

|
1. A method for analyzing food executed by an apparatus for analyzing food, the method comprising:
training an image encoder, a second text encoder, and a third text encoder using contrastive learning using a contrastive language-image pre-learning structure;
generating image captioning data using food image features extracted from a food image,
wherein the generating image captioning data extracts a first embedding having food image features through the image encoder to which the food image is input and generates image captioning data including food ingredients for the food image by inputting the extracted first embedding to the first text decoder, and wherein the generating image captioning data extracts a second embedding having food ingredient features through the second text encoder to which the inferred food ingredients are input and generates image captioning data including food recipes for the food image by combining the first embedding having the extracted food image features and the extracted second embedding and inputting the combination to the second text decoder; and
generating a food name for the food image using the generated image captioning data,
wherein the generating a food name extracts a third embedding having food recipe features through a third text encoder to which the generated food recipe is input and generates a food name for the food image by combining the extracted first, second, and third embeddings and inputting the combination to the third text decoder.
|