CPC G16B 30/00 (2019.02) [C12Q 1/6827 (2013.01); C12Q 1/6869 (2013.01); G06F 16/2255 (2019.01); G06F 17/18 (2013.01); G16B 5/00 (2019.02); G16B 10/00 (2019.02); G16B 20/20 (2019.02); G16B 20/40 (2019.02); G16B 40/00 (2019.02); G16B 40/10 (2019.02); G16B 45/00 (2019.02); G16B 50/00 (2019.02)] | 20 Claims |
1. A computer-implemented method comprising:
accessing a first sequenced output and a second sequenced output, wherein the first sequenced output and the second sequenced output contain variants occurring at different carriers and at different carrier positions;
generating hashes over a selected pattern length of positions for those carrier positions that are shared between the first sequenced output and the second sequenced output to produce window hashes for base patterns in a first sequence and a second sequence, and wherein the first sequence is based on the shared carrier positions and the first sequenced output, the second sequence is based on the shared carrier positions and the second sequenced output, and the window hashes are non-unique;
selecting those of the window hashes that occur less than a ceiling number of times;
comparing the selected window hashes between the first sequence and the second sequence on a starting position basis such that selected window hashes for base patterns having same start positions in the first sequenced output and the second sequenced output are compared;
identifying common window hashes between the first sequence and the second sequence based on the comparing; and
determining a similarity measure between the first sequence and the second sequence based on the common window hashes.
|