US 11,869,632 B2
Method and system for analyzing sequences
Kijong Yi, Daejeon (KR); and Young Seok Ju, Daejeon (KR)
Assigned to Genome Insight Technology, Inc., Daejeon (KR)
Filed by Genome Insight Technology, Inc., Daejeon (KR)
Filed on Dec. 7, 2022, as Appl. No. 18/077,016.
Claims priority of application No. 10-2021-0180438 (KR), filed on Dec. 16, 2021; and application No. 10-2022-0055514 (KR), filed on May 4, 2022.
Prior Publication US 2023/0197199 A1, Jun. 22, 2023
Int. Cl. G01N 33/48 (2006.01); G16B 30/10 (2019.01); G16B 40/00 (2019.01)
CPC G16B 30/10 (2019.02) [G16B 40/00 (2019.02)] 20 Claims
 
1. A method performed by one or more processors, the method comprising:
determining, for a partial genome sequencing process of an organism, at least one target length of at least one partial genome sequence output;
causing a sequencer to generate, based on a sample of the organism, a plurality of marking data associated with a genome of the organism, wherein generating of the plurality of marking data comprises:
imaging first marking data of the plurality of marking data;
determining, based on a plurality of cycles associated with the at least one target length, the first marking data of the plurality of marking data, wherein the first marking data comprises a plurality of image data portions, and wherein the plurality of image data portions comprises:
a first image data portion associated with a first length of a first polynucleotide chain of the genome; and
a second image data portion associated with a second length of a second polynucleotide chain of the genome, wherein the second length is shorter than the first length; and
before imaging second marking data of the plurality of marking data:
converting the first marking data to first sequence data;
detecting, based on the first sequence data, completion of a first read of paired-end reads and a partial completion of a second read of the paired-end reads, wherein the partial completion of the second read is associated with the second length of the second polynucleotide chain and satisfies a target length associated with the second read;
aligning, based on reference sequence data associated with the organism, the first sequence data, wherein the aligned first sequence data comprises:
a first aligned portion corresponding to the first read; and
a second aligned portion corresponding to a partial portion of the second read;
identifying, based on the aligned first sequence data, a structural variant; and
generating a first report comprising information on the identified structural variant.