US 12,008,254 B2
Deduplication of storage device encoded data
Ariel Navon, Revava (IL); and Shay Benisty, Beer Sheva (IL)
Assigned to Western Digital Technologies, Inc., San Jose, CA (US)
Filed by Western Digital Technologies, Inc., San Jose, CA (US)
Filed on Feb. 23, 2021, as Appl. No. 17/182,725.
Claims priority of provisional application 63/135,044, filed on Jan. 8, 2021.
Prior Publication US 2022/0221999 A1, Jul. 14, 2022
Int. Cl. G06F 3/06 (2006.01); G06F 11/10 (2006.01)
CPC G06F 3/0641 (2013.01) [G06F 3/0608 (2013.01); G06F 3/0659 (2013.01); G06F 3/0673 (2013.01); G06F 11/1076 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system comprising:
a storage device comprising:
a storage medium configured to store host data blocks; and
a storage device controller, comprising at least one processor and at least one memory, and configured to:
store, to the storage medium and through an error correction code engine in a write path of the storage device, the host data blocks, wherein the stored host data blocks comprise a first encoded comparison data unit encoded by the error correction code engine in the write path;
return, to a host system, previously-stored host data blocks responsive to decoding, through an error correction code engine in a read path of the storage device, the stored host data blocks from the storage medium;
encode, using the error correction code engine in the write path, a target data unit to generate an encoded target data unit comprised of:
an input host data block divided into a plurality of symbols; and
at least one parity symbol generated using the plurality of symbols of the input host data block and an error correction code coding algorithm for the error correction code engine in the write path;
retrieve, for a deduplication operation, the first encoded comparison data unit from the storage medium without decoding the first encoded comparison data unit using the error correction code engine in the read path of the storage device, wherein the first encoded comparison data unit comprises:
a previously-stored host data block divided into a plurality of symbols; and
at least one parity symbol generated prior to storage in the storage medium using the plurality of symbols of the previously-stored host data block and the error correction code coding algorithm for the error correction code engine in the write path;
compare, responsive to encoding the target data unit and retrieving the first encoded comparison data unit and using a bit-by-bit exclusive-or comparison, each bit of the encoded target data unit to each corresponding bit of the first encoded comparison data unit;
determine, based on a number of bits that are not equal between the encoded target data unit and the first encoded comparison data unit, a first similarity value; and
eliminate, responsive to the first similarity value, at least one duplicate data unit selected from:
the target data unit; and
the first encoded comparison data unit in the storage medium.