US 11,914,653 B1
Systems and methods for removing human genetic data from genetic sequences
Oystein Friestad Saebo, Santa Cruz, CA (US)
Assigned to Pathogenomix, Inc., Santa Cruz, CA (US)
Filed by Pathogenomix, Inc., Santa Cruz, CA (US)
Filed on Jan. 23, 2023, as Appl. No. 18/100,074.
Int. Cl. G06F 16/906 (2019.01); G06F 16/901 (2019.01); G06F 16/903 (2019.01); G16B 30/00 (2019.01); G16B 40/00 (2019.01); G16B 50/30 (2019.01)
CPC G06F 16/906 (2019.01) [G06F 16/9014 (2019.01); G06F 16/90344 (2019.01); G16B 30/00 (2019.02); G16B 40/00 (2019.02); G16B 50/30 (2019.02)] 20 Claims
 
1. A system, comprising:
one or more processors coupled to memory, the one or more processors configured to:
access a hash table that stores a plurality of first k-mers of a human genome, each first k-mer of the plurality of first k-mers corresponding to a first number of characters (k);
generate a plurality of second k-mers of a read of a cluster of a plurality of clusters, the plurality of clusters obtained from a sample;
determine a second number of the plurality of second k-mers that match at least one of the plurality of first k-mers in the hash table;
generate a subset of the plurality of clusters by removing the cluster from the plurality of clusters responsive to the second number satisfying a threshold; and
transmit the subset of the plurality of clusters to a computing system.