US 12,118,059 B2
Projection-based techniques for updating singular value decomposition in evolving data sets
Vasileios Kalantzis, White Plains, NY (US); Georgios Kollias, White Plains, NY (US); Shashanka Ubaru, Ossining, NY (US); Lior Horesh, North Salem, NY (US); and Kenneth Lee Clarkson, Madison, NJ (US)
Assigned to International Business Machines Corporation, Armonk, NY (US)
Filed by International Business Machines Corporation, Armonk, NY (US)
Filed on Jun. 1, 2021, as Appl. No. 17/335,928.
Prior Publication US 2022/0382831 A1, Dec. 1, 2022
Int. Cl. G06F 17/16 (2006.01); G06F 16/22 (2019.01); G06F 17/11 (2006.01)
CPC G06F 17/16 (2013.01) [G06F 16/2237 (2019.01); G06F 17/11 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A system, comprising:
at least one processing component;
at least one memory component; and
a data processing component configured to carry out vector space modeling of an evolving dataset for a latent semantic analysis component of the system, wherein the evolving data set is stored as a matrix in the at least one memory component, and wherein the evolving dataset comprises a first set of data loaded as an initial matrix B with a truncated singular value decomposition (SVD) Bk written as: Bk=UkΣkVkH, wherein Uk represents k leading left singular vectors of the initial matrix B, Σk represents k leading singular values of the initial matrix B, and Vk represents k leading right singular vectors of the initial matrix B, and wherein the data processing component comprises:
an update module configured to, in response to receiving new data from at least one data source as an update to the evolving dataset, generating an updated truncated SVD of the evolving dataset, wherein the generating comprises:
loading a second set of data from the data source into the evolving dataset as a new matrix;
combining the initial matrix B and the new matrix to form an updated matrix A, wherein a rank-k truncated SVD Ak of the updated matrix A is written as: AkkΣkVkH, wherein Ûk represents k leading left singular vectors of the updated matrix A, Σk represents k leading singular values of the updated matrix A, and VkH represents k leading right singular vectors of the updated matrix A;
generating a first projection matrix Z, wherein range(Z) approximates range(Ûk);
generating a second projection matrix W, wherein range(WH) approximates range(VkH); and
determining an approximate truncated SVD of the updated matrix A written as Ak≈(ZFkk(WGk)H, wherein Θk represents k leading singular eigenvalues of a matrix ZHAW, Fk represents k leading left singular eigenvectors of the matrix ZHAW, and Gk represents k leading right singular eigenvectors of the matrix ZHAW, and wherein the determining comprises:
setting the second projection matrix W equal to an identity matrix so that ZHAW=ZHA; and
by carrying out 2k steps of a Lanczos algorithm on ZHAAHZ, determining the k leading left singular eigenvectors Fk and the k leading singular eigenvalues Θk; and
an indexing module configured to:
in response to receiving a query at the latent analysis component, identify relationships between the query and terms in the updated evolving dataset, wherein the identifying comprises finding latent factor associations in the updated evolving dataset based on the updated truncated SVD; and
return a result of the query based on the identified relationships.