US 12,340,311 B2
	Ranking user comments on media using reinforcement learning optimizing for session dwell time
Kapil Thadani, New York, NY (US); Akshay Soni, New York, NY (US); Parikshit Shah, New York, NY (US); Troy Chevalier, New York, NY (US); Sreekanth Ramakrishnan, New York, NY (US); Aaron Nagao, New York, NY (US); and Zhi Qu, New York, NY (US)
Assigned to Yahoo Assets LLC, New York, NY (US)
Filed by Yahoo Assets LLC, New York, NY (US)
Filed on Apr. 10, 2023, as Appl. No. 18/132,499.
Application 18/132,499 is a continuation of application No. 16/446,480, filed on Jun. 19, 2019, granted, now 11,625,599.
Prior Publication US 2023/0244937 A1, Aug. 3, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06N 3/08 (2023.01); G06F 16/2457 (2019.01); G06F 16/338 (2019.01); G06F 40/216 (2020.01); G06N 3/006 (2023.01); G06N 3/02 (2006.01); G06N 3/084 (2023.01); G06N 7/01 (2023.01); G06N 7/02 (2006.01)

CPC G06N 3/08 (2013.01) [G06F 16/24578 (2019.01); G06F 40/216 (2020.01); G06N 7/01 (2023.01); G06F 16/338 (2019.01); G06N 3/006 (2013.01); G06N 3/02 (2013.01); G06N 3/084 (2013.01); G06N 7/023 (2013.01); G06N 7/026 (2013.01)]

20 Claims

1. A method, comprising:

using a scoring model to score each comment, of comments, based on corresponding features of the comment;

receiving, from a client device, a request to serve a subset of the comments comprising some but not all of the comments scored using the scoring model;

determining a plurality of possible rankings of the comments associated with a plurality of possible permutations, wherein the plurality of possible rankings of the comments comprises a first possible ranking of the comments associated with a first possible permutation and a second possible ranking of the comments associated with a second possible permutation;

responsive to the request to serve the subset of the comments comprising some but not all of the comments scored using the scoring model, selecting a ranking of the comments that is one permutation from the plurality of possible rankings of the comments, wherein selecting the ranking is in accordance with a probability distribution of the plurality of possible rankings that is based on scores of the comments, wherein the ranking of the comments is associated with increasing a reward at a given time by representing the comments in an order of the ranking, receiving one or more measurable reactions as a scalar reward for the ranking, and updating a ranking mechanism to increase the reward; and

serving one or more comments identified by the selected ranking over a network to the client device.