US 11,984,198 B2
Hash-based efficient comparison of sequencing results
Geert Trooskens, Meise (BE); and Wim Maria R. Van Criekinge, Waarloos (BE)
Assigned to SHARECARE AI, INC., Palo Alto, CA (US)
Filed by SHARECARE AI, INC., Palo Alto, CA (US)
Filed on Jan. 9, 2023, as Appl. No. 18/152,118.
Application 18/152,118 is a continuation of application No. 16/575,278, filed on Sep. 18, 2019, granted, now 11,551,784.
Claims priority of provisional application 62/734,872, filed on Sep. 21, 2018.
Claims priority of provisional application 62/734,895, filed on Sep. 21, 2018.
Claims priority of provisional application 62/734,840, filed on Sep. 21, 2018.
Prior Publication US 2023/0162817 A1, May 25, 2023
Int. Cl. G16B 30/00 (2019.01); C12Q 1/6827 (2018.01); C12Q 1/6869 (2018.01); G06F 16/22 (2019.01); G06F 17/18 (2006.01); G16B 5/00 (2019.01); G16B 10/00 (2019.01); G16B 20/00 (2019.01); G16B 20/20 (2019.01); G16B 20/40 (2019.01); G16B 40/00 (2019.01); G16B 40/10 (2019.01); G16B 40/30 (2019.01); G16B 45/00 (2019.01); G16B 50/00 (2019.01); H04L 9/32 (2006.01)
CPC G16B 30/00 (2019.02) [C12Q 1/6827 (2013.01); C12Q 1/6869 (2013.01); G06F 16/2255 (2019.01); G06F 17/18 (2013.01); G16B 5/00 (2019.02); G16B 10/00 (2019.02); G16B 20/20 (2019.02); G16B 20/40 (2019.02); G16B 40/00 (2019.02); G16B 40/10 (2019.02); G16B 45/00 (2019.02); G16B 50/00 (2019.02)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
accessing a first sequenced output and a second sequenced output, wherein the first sequenced output and the second sequenced output contain variants occurring at different carriers and at different carrier positions;
generating hashes over a selected pattern length of positions for those carrier positions that are shared between the first sequenced output and the second sequenced output to produce window hashes for base patterns in a first sequence and a second sequence, and wherein the first sequence is based on the shared carrier positions and the first sequenced output, the second sequence is based on the shared carrier positions and the second sequenced output, and the window hashes are non-unique;
selecting those of the window hashes that occur less than a ceiling number of times;
comparing the selected window hashes between the first sequence and the second sequence on a starting position basis such that selected window hashes for base patterns having same start positions in the first sequenced output and the second sequenced output are compared;
identifying common window hashes between the first sequence and the second sequence based on the comparing; and
determining a similarity measure between the first sequence and the second sequence based on the common window hashes.