US 11,894,106 B2
Systems and methods for data communication, storage, and analysis using reference motifs
Jarl A. Nilsson, Mountain View, CA (US); and William Knox Carey, Mountain View, CA (US)
Assigned to Intertrust Technologies Corporation, Berkeley, CA (US)
Filed by Intertrust Technologies Corporation, Milpitas, CA (US)
Filed on Aug. 7, 2018, as Appl. No. 16/057,357.
Claims priority of provisional application 62/542,203, filed on Aug. 7, 2017.
Prior Publication US 2019/0042694 A1, Feb. 7, 2019
Int. Cl. G16B 30/00 (2019.01); G16B 20/20 (2019.01); G16B 30/10 (2019.01); G16B 30/20 (2019.01)
CPC G16B 30/10 (2019.02) [G16B 20/20 (2019.02); G16B 30/00 (2019.02); G16B 30/20 (2019.02)] 10 Claims
 
1. A method for efficiently communicating genomic information, comprising:
providing, by a second computing system to a first computing system, a request to transfer unaligned genomic sequence read data and associated quality score information to the second computing system;
receiving, from the first computer system, a response to the request comprising a variant list, one or more indications of first and second reference motifs, and quality score curve parameter information:
the variant list indicating differences between at least a first portion of the unaligned genomic sequence read data and at least a portion of a first reference sequence and differences between at least a second portion of the unaligned genomic sequence read data and at least a portion of a second reference sequence, the variant list being generated based on a comparison between the first portion of the unaligned genomic sequence read data and the at least a portion of the first reference sequence, and a comparison between the second portion of the unaligned genomic sequence read data and the at least a portion of the second reference sequence:
the second reference sequence being selected from a reference table, the second reference sequence being associated with a second reference motif, the second reference motif identified as being included in the unaligned genomic sequence read data and being different than a first reference motif;
the first reference sequence being selected from the reference table, the first reference sequence being associated with the first reference motif, the first reference motif identified as being included in the unaligned genomic sequence read data; and
the reference table comprising a plurality of reference motifs and a plurality of reference sequences, wherein each reference motif of the plurality of reference motifs is associated with a reference sequence of the plurality of reference sequences; and
the quality score curve parameter information being generated based on a quality score curve, the quality score curve parameter information characterizing the quality score curve:
the quality score curve being generated based on the quality score information, the quality score information being associated with the unaligned genomic sequence read data;
reconstructing, by the second computing system, the unaligned genomic sequence read data based on the variant list and the one or more indications of first and second reference motifs without transferring the unaligned genomic sequence read data to the second computing system; and
reconstructing, by the second computing system, at least an approximation of the quality score curve based on the quality score curve parameter information without transferring the quality score information associated with the unaligned genomic sequence read data to the second computing system.