US 12,380,071 B2
Tracking data lineage and applying data removal to enforce data removal policies
Arpan Jugalkishor Asawa, San Francisco, CA (US); Brian Douglas Remick, Morgan Hill, CA (US); Marcus Vinicius Silva Gois, Bothell, WA (US); Ritesh Kumar Sinha, Fremont, CA (US); Sidi Lin, San Jose, CA (US); Sejal Kiran Shinde, Santa Clara, CA (US); Ryan Lee Jobse, Hillsboro, VA (US); Rong Guo, Sunnyvale, CA (US); Sharanya Chinnusamy, Fremont, CA (US); Marc Andrew Power, San Jose, CA (US); and Marcus Jon Jager, Boulder Creek, CA (US)
Assigned to Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed by Microsoft Technology Licensing, LLC, Redmond, WA (US)
Filed on Jun. 23, 2021, as Appl. No. 17/355,873.
Prior Publication US 2022/0414070 A1, Dec. 29, 2022
Int. Cl. G06F 16/215 (2019.01); G06F 16/21 (2019.01); G06F 16/901 (2019.01)
CPC G06F 16/215 (2019.01) [G06F 16/219 (2019.01); G06F 16/9024 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A data removal computer system, comprising:
at least one processor; and
memory storing instructions executable by the at least one processor, wherein the instructions, when executed, cause the data removal computer system to:
identify an extraction time at which a portion of user data is extracted from a computing system;
identify a path to a storage location in a source data store where the portion of user data, extracted from the computing system, is stored;
generate, in a graph, a node that represents the portion of user data stored in the storage location in the source data store, wherein the node stores a path identifier that identities the path to the storage location that stores the portion of user data, and an original extraction date (OED) value identifying an earliest source data timestamp corresponding to the node, the earliest source data timestamp comprising an earliest extraction time that indicates when user data that was used to derive the node was extracted from a corresponding substrate;
intermittently traverse the graph;
comparing the earliest extraction time corresponding to each node in the graph to an expiration time;
identify the node as an expired node in the graph based on the comparing and a determination that the earliest extraction time, identified by the OED value, is older than the expiration time; and
based on the identification of the node as an expired node,
obtain the path identifier from the expired node,
access the storage location in the source data store using the path identifier obtained from the expired node,
delete the portion of user data stored at the storage location in the source data store,
delete the expired node from the graph to obtain a modified graph, and
store the modified graph in persistent memory.