US 12,462,610 B2
Video understanding neural network systems and methods using the same
Zibo Meng, Palo Alto, CA (US); Ming Chen, Palo Alto, CA (US); and Chiuman Ho, Palo Alto, CA (US)
Assigned to GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., LTD., Guangdong (CN)
Filed by GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., LTD., Guangdong (CN)
Filed on Nov. 30, 2021, as Appl. No. 17/538,512.
Application 17/538,512 is a continuation of application No. PCT/CN2019/122358, filed on Dec. 2, 2019.
Claims priority of provisional application 62/855,489, filed on May 31, 2019.
Prior Publication US 2022/0157059 A1, May 19, 2022
Int. Cl. G06V 40/20 (2022.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01); G06T 3/4046 (2024.01); G06V 10/44 (2022.01); G06V 10/77 (2022.01); G06V 10/82 (2022.01); G06V 20/40 (2022.01)
CPC G06V 40/23 (2022.01) [G06T 3/4046 (2013.01); G06V 10/454 (2022.01); G06V 10/7715 (2022.01); G06V 10/82 (2022.01); G06V 20/41 (2022.01); G06V 20/46 (2022.01); G06N 3/045 (2023.01); G06N 3/048 (2023.01)] 20 Claims
OG exemplary drawing
 
1. A neural network system implemented by one or more electronic devices, comprising a target neural network block configured to be inserted between a first baseline neural network block and a second baseline neural network block of a baseline neural network, the target neural network block including:
at least one pooling unit to:
receive an input feature map outputted by the first baseline neural network block processing an image,
temporally pooling the input feature map into at least one intermediate feature map;
at least one other processing unit to:
temporally process the at least one intermediate feature map to a residual feature map; and
generate an output feature map configured for inputting into the second baseline neural network block by combining the residual feature map with the input feature map;
wherein temporal information extracted by the target neural network block is different from temporal information extracted by the baseline neural network.