US 12,223,695 B2
Target-oriented reinforcement learning method and apparatus for performing the same
Byoung-Tak Zhang, Seoul (KR); Kibeom Kim, Seoul (KR); Minsu Lee, Seongnam-si (KR); Min Whoo Lee, Yongin-si (KR); and Yoonsung Kim, Seoul (KR)
Assigned to SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION, Seoul (KR)
Appl. No. 17/427,957
Filed by SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION, Seoul (KR)
PCT Filed Dec. 8, 2020, PCT No. PCT/KR2020/017859
§ 371(c)(1), (2) Date Aug. 3, 2021,
PCT Pub. No. WO2022/080582, PCT Pub. Date Apr. 21, 2022.
Claims priority of application No. 10-2020-0131334 (KR), filed on Oct. 12, 2020.
Prior Publication US 2022/0398830 A1, Dec. 15, 2022
Int. Cl. G06V 10/776 (2022.01)
CPC G06V 10/776 (2022.01) [G06V 2201/07 (2022.01)] 4 Claims
OG exemplary drawing
 
1. A reinforcement learning method performed by a target-oriented reinforcement learning model, the reinforcement learning method comprising:
collecting a data related to a target of the reinforcement learning as target data when an event in which an agent achieving the target and receiving a reward occurs, wherein the target data comprises a predetermined number of frame images before an occurrence of the event and a label indicating the target data corresponds to the target;
learning the target data as auxiliary learning for the reinforcement learning; and
incorporating results of the learning of the target data into performance of the reinforcement learning,
wherein the target-oriented reinforcement learning model comprises:
a feature extraction unit implemented by at least one processor and configured to extract features from state data and the target data;
an action module implemented by the at least one processor and configured to output an action and a value according to a policy based on the feature extracted from the state data; and
a classification module implemented by the at least one processor and configured to classify the target data based on the feature extracted from the target data, and
wherein learning the collected target data comprises:
extracting, by the feature extraction unit, a feature from batch data of the target data;
extracting, by the classification module, a predicted value according to the feature extracted from the batch data of the target data;
calculating, by the target-oriented reinforcement learning model, a loss for the auxiliary learning by using the predicted value and the label of the target data; and
learning, by the target-oriented reinforcement learning model, the visual representation of the target data by using the loss for the auxiliary learning.