US 12,468,662 B2
System and method for providing randomly-accessible compacted data
Aliasghar Riahi, Orinda, CA (US); Joshua Cooper, Columbia, SC (US); Mojgan Haddad, Orinda, CA (US); and Charles Yeomans, Orinda, CA (US)
Assigned to ATOMBEAM TECHNOLOGIES INC., Moraga, CA (US)
Filed by AtomBeam Technologies Inc., Moraga, CA (US)
Filed on Oct. 22, 2023, as Appl. No. 18/491,798.
Application 18/491,798 is a continuation of application No. 18/078,909, filed on Dec. 9, 2022, granted, now 11,899,624.
Application 18/078,909 is a continuation of application No. 17/734,052, filed on Apr. 30, 2022, granted, now 11,609,882, issued on Mar. 21, 2023.
Application 17/734,052 is a continuation of application No. 17/180,439, filed on Feb. 19, 2021, granted, now 11,366,790, issued on Jun. 21, 2022.
Application 17/180,439 is a continuation in part of application No. 16/923,039, filed on Jul. 7, 2020, granted, now 11,232,076, issued on Jan. 25, 2022.
Application 16/923,039 is a continuation in part of application No. 16/716,098, filed on Dec. 16, 2019, granted, now 10,706,018, issued on Jul. 7, 2020.
Application 16/716,098 is a continuation of application No. 16/455,655, filed on Jun. 27, 2019, granted, now 10,509,771, issued on Dec. 17, 2019.
Application 16/455,655 is a continuation in part of application No. 16/200,466, filed on Nov. 26, 2018, granted, now 10,476,519, issued on Nov. 12, 2019.
Application 16/200,466 is a continuation in part of application No. 15/975,741, filed on May 9, 2018, granted, now 10,303,391, issued on May 28, 2019.
Claims priority of provisional application 63/140,111, filed on Jan. 21, 2021.
Claims priority of provisional application 63/027,166, filed on May 19, 2020.
Claims priority of provisional application 62/926,723, filed on Oct. 28, 2019.
Claims priority of provisional application 62/578,824, filed on Oct. 30, 2017.
Prior Publication US 2024/0160609 A1, May 16, 2024
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/174 (2019.01); G06F 3/06 (2006.01)
CPC G06F 16/1752 (2019.01) [G06F 3/0608 (2013.01); G06F 3/0641 (2013.01); G06F 3/067 (2013.01)] 2 Claims
OG exemplary drawing
 
1. A system for random-access manipulation of compacted data files, comprising:
a computing system comprising a memory, a processor, and a non-volatile data storage device;
a deconstruction subsystem comprising a first plurality of programming instructions stored in the memory and operable on the processor, wherein the first plurality of programming instructions, when operating on the processor, cause the computing system to:
deconstruct a data stream into a plurality of sourceblocks;
encode the data stream using a reference codebook by:
retrieving a codeword for each sourceblock from the reference codebook;
where there is no codeword for a first sourceblock, generating a hash code as a new codeword and storing the first sourceblock and its newly-created codeword in the reference codebook; and
storing the codewords corresponding to the data stream in a compacted data file;
a reconstruction subsystem comprising a third plurality of programming instructions stored in the memory and operable on the processor, wherein the third plurality of programming instructions, when operating on the processor, cause the computing system to:
retrieve a plurality of codewords from the compacted data file received from a requesting process;
decode each of the plurality of retrieved codewords by, for each retrieved codeword, retrieving the sourceblock associated with the respective codeword from the reference codebook; and
provide the retrieved sourceblocks as a data stream to the requesting process; and
a random-access subsystem comprising a second plurality of programming instructions stored in the memory and operating on the processor, wherein the second plurality of programming instructions, when operating on the processor, cause the computing subsystem to:
receive a data search query;
estimate, using an estimator module, a first starting bit location in the compacted data file;
refine the first starting bit location by:
determining whether a bit sequence starting at the first starting bit location corresponds to a codeword boundary and, if not, traversing the reference codebook until a codeword boundary is located at a new starting bit;
traversing from the new starting bit until a start codeword corresponding to the beginning of the data search query is identified; and
sending the first start codeword and a plurality of immediately following codewords from the compacted data file to the reconstruction engine for decoding.