US 12,423,271 B2
System and methods for adaptive bandwidth-efficient encoding of genomic data
Joshua Cooper, Columbia, SC (US); Brian Galvin, Silverdale, WA (US); and Erin Johnston, Carlsbad, CA (US)
Assigned to ATOMBEAM TECHNOLOGIES INC., Moraga, CA (US)
Filed by AtomBeam Technologies Inc., Moraga, CA (US)
Filed on Jan. 1, 2025, as Appl. No. 19/007,543.
Application 19/007,543 is a continuation in part of application No. 18/449,706, filed on Aug. 15, 2023, granted, now 12,189,580.
Application 18/449,706 is a continuation of application No. 17/569,500, filed on Jan. 5, 2022, granted, now 11,734,231, issued on Aug. 23, 2023.
Application 17/569,500 is a continuation in part of application No. 17/234,007, filed on Apr. 19, 2021, granted, now 11,782,879, issued on Oct. 10, 2023.
Application 18/449,706 is a continuation in part of application No. 17/234,007, filed on Apr. 19, 2021, granted, now 11,782,879, issued on Oct. 10, 2023.
Application 17/234,007 is a continuation in part of application No. 17/180,439, filed on Feb. 19, 2021, granted, now 11,366,790, issued on Jun. 21, 2022.
Application 17/180,439 is a continuation in part of application No. 16/923,039, filed on Jul. 7, 2020, granted, now 11,232,076, issued on Jan. 25, 2022.
Application 16/923,039 is a continuation in part of application No. 16/716,098, filed on Dec. 16, 2019, granted, now 10,706,018, issued on Jul. 7, 2020.
Application 16/716,098 is a continuation of application No. 16/455,655, filed on Jun. 27, 2019, granted, now 10,509,771, issued on Dec. 17, 2019.
Application 16/455,655 is a continuation in part of application No. 16/200,466, filed on Nov. 26, 2018, granted, now 10,476,519, issued on Nov. 12, 2019.
Application 16/200,466 is a continuation in part of application No. 15/975,741, filed on May 9, 2018, granted, now 10,303,391, issued on May 28, 2019.
Claims priority of provisional application 63/140,111, filed on Jan. 21, 2021.
Claims priority of provisional application 63/027,166, filed on May 19, 2020.
Claims priority of provisional application 62/926,723, filed on Oct. 28, 2019.
Claims priority of provisional application 62/578,824, filed on Oct. 30, 2017.
Prior Publication US 2025/0139059 A1, May 1, 2025
Int. Cl. G06F 16/174 (2019.01); G06F 3/06 (2006.01)
CPC G06F 16/1752 (2019.01) [G06F 3/0608 (2013.01); G06F 3/0641 (2013.01); G06F 3/067 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A system for adaptive bandwidth-efficient data encoding, comprising:
a computing device comprising a processor and a memory;
a sequence analyzer comprising a first plurality of programming instructions stored in the memory and operable on the processor, wherein the first plurality of programming instructions, when operating on the processor, cause the processor to:
receive a sequence dataset;
scan the sequence dataset and maintain a count of the number of unique characters contained within the sequence dataset;
for each occurrence of a unique character which causes the count of the number of unique characters to reach a value equal to a power of two, indicate a position in the sequence dataset corresponding to the unique character;
calculate, for each of the indicated positions, a compaction ratio that would be obtained by dividing the sequence dataset into one of a plurality of segments at one of the indicated positions;
deconstruct the sequence dataset into a plurality of deconstructed sourceblocks at the positions that yield the best compaction ratio; and
pass the plurality of deconstructed sourceblocks to a data deconstruction engine;
an adaptive sourceblock optimizer configured to determine and dynamically adjust an optimal sourceblock size based on sequence complexity, alphabet size, and frequency distribution of characters;
a data deconstruction engine comprising a second plurality of programming instructions stored in the memory and operable on the processor, wherein the second plurality of programming instructions, when operating on the processor, cause the processor to:
receive the plurality of deconstructed sourceblocks from the sequence analyzer;
deconstruct the sequence dataset into sourceblocks using the optimal sourceblock size from the adaptive sourceblock optimizer; and
create a plurality of codewords for storage or transmission of the sequence dataset.