US 11,670,295 B2
Device, method, and program for enhancing output content through iterative generation
Seohyun Back, Suwon-si (KR); Yonghyun Ryu, Suwon-si (KR); Wonho Ryu, Suwon-si (KR); Haejun Lee, Suwon-si (KR); Cheolseung Jung, Suwon-si (KR); Sai Chetan, Suwon-si (KR); and Jiyeon Hong, Suwon-si (KR)
Assigned to Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed by Samsung Electronics Co., Ltd., Suwon-si (KR)
Filed on Dec. 4, 2020, as Appl. No. 17/111,734.
Claims priority of application No. 10-2019-0160008 (KR), filed on Dec. 4, 2019.
Prior Publication US 2021/0174801 A1, Jun. 10, 2021
Int. Cl. G10L 15/22 (2006.01); G06T 13/00 (2011.01); G10L 15/16 (2006.01); G06N 3/08 (2023.01); G10L 15/18 (2013.01)
CPC G10L 15/22 (2013.01) [G06N 3/08 (2013.01); G06T 13/00 (2013.01); G10L 15/16 (2013.01); G10L 15/1815 (2013.01); G10L 2015/223 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A device for improving output content through iterative generation, the device comprising:
at least one processor; and
a memory storing instructions which, when executed by the at least one processor, cause the at least one processor to:
receive a natural language input while displaying base content,
based on the natural language input, obtain user intention information by using a natural language understanding (NLU) model,
based on a first user input, set a target area in the base content,
based on the user intention information or a second user input, determine input content,
based on the input content, the target area, and the user intention information, generate output content related to the base content by using a neural network (NN) model, the NN model being related to a generated adversarial network (GAN) model,
generate a caption for the output content by using an image captioning model,
calculate a similarity between text of the natural language input and the output content, and
based on the similarity, iterate the generation of the output content,
wherein the output content comprises first output content, and
wherein the instructions, when executed by the at least one processor, further cause the at least one processor to:
in response to the similarity not satisfying a predetermined condition, generate second output content different from the first output content based on the input content, the target area, and the user intention information by using the NN model.