| CPC G06F 16/3329 (2019.01) [G06F 16/3347 (2019.01); G06F 16/583 (2019.01)] | 20 Claims | 

| 
               1. A method performed by an electronic device acting as a server that is communicatively connected to a terminal, wherein the electronic device includes an input/output (I/O) system, a processor and a memory and a system bus connecting the I/O system, the processor and the memory together, the method comprising: 
            receiving, via the I/O system, an input image submitted by the terminal; 
                acquiring, from the memory, an image feature of the input image and state vectors corresponding to first n rounds of historical question answering dialog, n being a positive integer; 
                acquiring, via the I/O system, a question feature of a current round of questioning related to the input image submitted by the terminal; 
                performing, via the processor and using a visual dialog model stored in the memory, multimodal encoding on the image feature of the input image, the state vectors corresponding to the first n rounds of historical question answering dialog, and the question feature of the current round of questioning, to obtain a state vector corresponding to the current round of questioning, the performing further including acquiring a character string feature of an outputted character string corresponding to the current round of questioning by invoking a multimodal incremental transformer decoder in the visual dialog model; 
                performing, via the processor and using the visual dialog model stored in the memory, multimodal decoding on the state vector corresponding to the current round of questioning and the image feature of the input image, and the character string feature by invoking the multimodal incremental transformer decoder in the visual dialog model, to obtain a decoded feature vector; 
                determining, via the processor, an actual output answer corresponding to the current round of questioning according to the decoded feature vector, the actual output answer comprising the outputted character string; and 
                returning, via the I/O system, the actual output answer corresponding to the current round of questioning to the terminal. 
               |