US 12,451,141 B2
Generating multi-turn dialog datasets
Zilu Tang, Cambridge, MA (US); Zhongshen Zeng, Shenzhen (CN); and Yara Rizk, Cambridge, MA (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by INTERNATIONAL BUSINESS MACHINES CORPORATION, Armonk, NY (US)
Filed on Jun. 8, 2022, as Appl. No. 17/805,946.
Prior Publication US 2023/0402040 A1, Dec. 14, 2023
Int. Cl. G06F 16/31 (2019.01); G06F 7/58 (2006.01); G06N 3/08 (2023.01); G10L 17/18 (2013.01); G06N 5/022 (2023.01); G10L 15/22 (2006.01)
CPC G10L 17/18 (2013.01) [G06F 7/588 (2013.01); G06N 3/08 (2013.01); G06F 16/322 (2019.01); G06N 5/022 (2013.01); G10L 15/22 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-based method of generating multi-turn dialog datasets for training of a dialog or a conversational system having one or more agents, the method comprising:
automatically selecting an agent from a set of agents;
automatically calculating numbers corresponding to maximum depths of one or more nodes within a dialog tree found in the selected agent, the maximum depth corresponding to a number of nodes along a path from a root of the dialog tree down to the maximum depth of the one or more nodes;
automatically selecting a random dialog node from the one or more dialog nodes and generating a random number between a first number corresponding to a dialog node at the root of the tree and a second number corresponding to the maximum depth of the selected random dialog node along the path;
automatically identifying sentences from training data of the selected agent that are related to and satisfy a first sequential node condition of the selected random dialog node;
automatically determining an approach for responding to the first sequential node condition of the selected random dialog node that either satisfies the first sequential dialog node condition, or inserts a multi-turn conversational property, and generating a corresponding response;
automatically determining additional approaches for responding to each condition contained within subsequent sequential child nodes of the selected random dialog node that either satisfy each subsequent sequential child node condition or insert a multi-turn conversational property, and generating corresponding responses for each subsequent sequential child node until a response is generated for the node corresponding to the generated random number; and
automatically collecting and storing data relating to the selected agent and the generated responses, the data related to the selected agent and the generated responses to be used as training data for training of a dialog or a conversational system.