US 12,321,690 B2
Devices, systems, and methods for transcript sanitization
Chris Vanciu, Isle, MN (US); Kyle Smaagard, Forest Lake, MN (US); Boris Chaplin, Medina, MN (US); Dylan Morgan, Minneapolis, MN (US); Paul Gordon, Minneapolis, MN (US); Matt Matsui, Minneapolis, MN (US); Laura Cattaneo, Rochester, MN (US); and Catherine Bullock, Minneapolis, MN (US)
Assigned to Calabrio, Inc., Minneapolis, MN (US)
Filed by Calabrio, Inc., Minneapolis, MN (US)
Filed on May 20, 2023, as Appl. No. 18/320,977.
Claims priority of provisional application 63/344,290, filed on May 20, 2022.
Prior Publication US 2023/0409811 A1, Dec. 21, 2023
Int. Cl. G06F 40/166 (2020.01); G06F 40/205 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01)
CPC G06F 40/166 (2020.01) [G06F 40/205 (2020.01); G06F 40/284 (2020.01); G06F 40/30 (2020.01)] 15 Claims
OG exemplary drawing
 
1. A method of sanitizing a transcript, the method comprising:
selecting a transcript to be sanitized;
extracting a plurality of text items from the transcript;
condensing the transcript into a single string by concatenating the plurality of text items;
identifying potential redactions to be made in the transcript, the potential redactions being identifiable via a multi-pass process that includes:
generating initial redactions to be made based on surrounding context within the transcript, wherein the surrounding context is determined using a transformer-based machine learning model operating on the single string, and wherein the transformer-based machine learning model is trained using one or more natural language processing techniques to identify certain text based upon context and language structure; and
generating matching redactions to be made based on the initial redactions, wherein the matching redactions are determined by identifying single characters within a predetermined token distance from the initial redactions;
redacting the transcript at the potential redactions to sanitize the transcript, wherein redacting the transcript at the potential redactions to sanitize the transcript includes replacing characters in the potential redactions with at least one tag that is selected from a group consisting of:
unique tags, the unique tags being such that the tags are replaceable with fictionally consistent data in the potential redactions, wherein a unique tag comprises a category plus a unique identifier; and
generic tags, the generic tags being such that the tags are not replaceable with fictionally consistent data in the potential redactions; and
converting the redacted transcript to a specified vendor format.