US 12,032,721 B2
Synthesizing user transactional data for de-identifying sensitive information
Gaurav Singhal, Bangalore (IN); Deepak Patil, Pune (IN); Rahul Mitra, Kalyani (IN); and Atif Adib, Bangalore (IN)
Assigned to YODLEE, INC., San Mateo, CA (US)
Filed by Yodlee, Inc., San Mateo, CA (US)
Filed on Oct. 20, 2021, as Appl. No. 17/506,508.
Prior Publication US 2023/0121356 A1, Apr. 20, 2023
Int. Cl. G06F 21/62 (2013.01); G06F 18/2323 (2023.01); G06N 5/022 (2023.01); G06Q 20/38 (2012.01); G06Q 20/40 (2012.01)
CPC G06F 21/6254 (2013.01) [G06F 18/2323 (2023.01); G06N 5/022 (2013.01); G06Q 20/389 (2013.01); G06Q 20/401 (2013.01)] 18 Claims
OG exemplary drawing
 
1. A non-transitory computer-readable media storing computer instructions which when executed by one or more processors of a device cause the device to:
identify transactional data of a plurality of users;
cluster the plurality of users based on the transactional data, to form groups of users having transactional data representing similar transactional behavior;
generate synthesized transactional data for the users in each group by:
identifying a subset of the transactional data that corresponds to the users in each group,
shuffling the transactional data in the subset across the users in each group, wherein the shuffling includes:
constructing a pool of transactions from the subset of the transactional data,
for each user in each group, sampling transactions from the pool based on:
a number of transactions associated with the user in the subset of the transactional data, and
a category of each of the transactions associated with the user in the subset of the transactional data,
wherein at least one of:
 a number of transactions sampled from the pool for each user in each group matches the number of transactions in the subset of the transactional data that are associated with the user, or
 a number of transactions of a particular category sampled from the pool for each user in each group matches a number of transactions of the particular category in the subset of the transactional data that are associated with the user, and
perturbing portions of the shuffled transactional data.